ShowTheBest

Case-Based Reasoning

© 2002 Inductive Solutions, Inc. All rights reserved.

Help
and Documentation


Sample Problem     Sample CBR Application   

What ShowTheBest Does

Many web sites provide tables of information -- vendor comparisons, product evaluations, and all sorts of tabular listings. Most of these tables have the undesirable property of "information overload" -- they are too large to digest and require us to spend too much time on them. This is especially true for web sites that are linked to databases -- after a site visitor fills out a form, the "answer" is typically returned as a large table. The only decision-support typically provided are scripts that let users sort records in alphabetical order or by numerical fields like price.

One way to make information more useful is to provide a way to quickly score and sort the data according to any criteria. The criteria should include values and preferences; numbers and text; and structured and unstructured information that is specific to the underlying data. Static tables could then be transformed into a type of interactive expert system. It turns out that many of these requirements are satisfied by an expert system technology called Case-Based Reasoning.

ShowTheBest is a server-based system that transforms static tables into interactive HTML forms using methods derived from Case-Based Reasoning (CBR).

ShowTheBest applications include interactive search, scoring and prioritization (here is a restaurant selection application) and expert systems (here is an example that selects multiprocessor computer topologies). ShowTheBest is completely web-based and is implemented as fast C-compiled (not interpreted Perl or PHP) Common Gateway Interface (CGI) programs. By pressing a button, visitors to a ShowTheBest web page can quickly see the best (or worst) data records that match their requirements.

Why Case-Based Reasoning?

Case-Based Reasoning (CBR) is a branch of Artificial Intelligence (AI) that models human problem-solving using a database of cases-- a representation of profiles and solutions. The answer to a problem is essentially the answer to the "most similar problem" that was archived in the case database. Case-based answers are adapted from the closest matching cases, ranked by case score, and displayed to users in a sorted prioritized list.

This concern with modeling "similarity" is what makes CBR a good source of matching and scoring algorithms.

ShowTheBest is based on many of the CBR methods found in Induce-It. A case database is a table of records with fields. The current problem proile (or matching target) is another record called the Reference. Each case in the case database is compared field by field with the fields in the Reference. Each field has its own scoring and matching algorithm, specified by the developer.

By convention, field scores are normalized between 0 and 1, with 1 being a perfect field value match with the corresponding Reference field. Case field scores are computed by a weighted average of the field scores. The ShowTheBest case scores are normalized between 0 and 100, with 100 being a perfect match with the Reference.

Many of the other conventions that ShowTheBest uses can be seen in the Sample Problem(which shows the Design Tool in operation) and the Sample CBR Application. Here is a description of some other conventions associated with the ShowTheBest field matching algorithms:

Unstructured Text and Numbers
Unstructured text and numbers are usually entered into input forms. Some conventions include:
Structured Information
Developers can add implicit information associated with the problem that can improve field and case scoring. For example, developers may want to model geographical closeness, ie, that one neighborhood is closer than another neighborhood. Family relationships (hierarchies) may also need to be modeled. Finally, a developer may need to model preferences. For example, a user might specify a first choice of Indian Cuisineand a second choice of French Cuisine in a reference field. A case database field listing Indian should score higher than a case database field listing French. This ShowTheBest structured information application correctly models such preferences.

In general, field scores based on structured information are "fuzzy" -- the scores range between 0 and 1. Some conventions include:

Tuning the Case Database
In CBR scoring systems, it is necessary to have as many different cases as possible that “span” the problem space: the cases in the case database should all be “suitably different” from each other. Sometimes, the case scores mask these differences: for example, it is possible that two different cases can have identical scores. You can easily observe this in a case database having only choice type fields: two cases having the same score implies that they have the same degree of similarity to the Reference case, but not necessarily to each other.

Tuning tasks include:


Sample Problem     Sample CBR Application