Help ShowTheBest - Case-Based Reasoning

ShowTheBest

Case-Based Reasoning

Help
and Documentation

What ShowTheBest Does
- Why Case-Based Reasoning?
- Similarity, Matching, and Scoring Algorithms
Using the ShowTheBest Design Tool
- Step 1: Specify the Case Database Table
- Step 2: Specify the Case Database Field Maps
- Building a Sample Application with the ShowTheBest Design Tool
ShowTheBest.Com: A Version of ShowTheBest that Runs on Your Server
- ShowTheBest.Com Documentation
- Which version is right for me?

What ShowTheBest Does

Many web sites provide tables of information -- vendor comparisons, product evaluations, and all sorts of tabular listings. Most of these tables have the undesirable property of "information overload" -- they are too large to digest and require us to spend too much time on them. This is especially true for web sites that are linked to databases -- after a site visitor fills out a form, the "answer" is typically returned as a large table. The only decision-support typically provided are scripts that let users sort records in alphabetical order or by numerical fields like price.

One way to make information more useful is to provide a way to quickly score and sort the data according to any criteria. The criteria should include values and preferences; numbers and text; and structured and unstructured information that is specific to the underlying data. Static tables could then be transformed into a type of interactive expert system. It turns out that many of these requirements are satisfied by an expert system technology called Case-Based Reasoning.

ShowTheBest is a server-based system that transforms static tables into interactive HTML forms using methods derived from Case-Based Reasoning (CBR).

ShowTheBest applications include interactive search, scoring and prioritization (here is a restaurant selection application) and expert systems (here is an example that selects multiprocessor computer topologies). ShowTheBest is completely web-based and is implemented as fast C-compiled (not interpreted Perl or PHP) Common Gateway Interface (CGI) programs. By pressing a button, visitors to a ShowTheBest web page can quickly see the best (or worst) data records that match their requirements.

Why Case-Based Reasoning?

Case-Based Reasoning (CBR) is a branch of Artificial Intelligence (AI) that models human problem-solving using a database of cases-- a representation of profiles and solutions. The answer to a problem is essentially the answer to the "most similar problem" that was archived in the case database. Case-based answers are adapted from the closest matching cases, ranked by case score, and displayed to users in a sorted prioritized list.

This concern with modeling "similarity" is what makes CBR a good source of matching and scoring algorithms.

ShowTheBest is based on many of the CBR methods found in Induce-It. A case database is a table of records with fields. The current problem proile (or matching target) is another record called the Reference. Each case in the case database is compared field by field with the fields in the Reference. Each field has its own scoring and matching algorithm, specified by the developer.

By convention, field scores are normalized between 0 and 1, with 1 being a perfect field value match with the corresponding Reference field. Case field scores are computed by a weighted average of the field scores. The ShowTheBest case scores are normalized between 0 and 100, with 100 being a perfect match with the Reference.

Many of the other conventions that ShowTheBest uses can be seen in the Sample Problem(which shows the Design Tool in operation) and the Sample CBR Application. Here is a description of some other conventions associated with the ShowTheBest field matching algorithms:

Unstructured Text and Numbers

Unstructured text and numbers are usually entered into input forms. Some conventions include:

"Don’t Care" and "Don't Know" Values
ShowTheBest supports two constants that are used to model different types of "Don’t Care" or "Don't Know" conditions, " " (one or more blank spaces) and "-" (a single hyphen). A field with value " " matches nothing -- its field score with respect to any input reference field value is zero. A field with value "-" is matches everything -- its field score with respect to any reference field value is 1. A reference field with input value " " matches nothing -- the field score for all case field values is zero. A reference field with value "-" matches everything -- the field score for all case field values is 1.
"Wildcard Matching"
The "wildcard" symbol * matches any sequence of characters in the text field.
Disjunction ("or operator")
The disjunction ("or operator" symbol) + matches individual strings. In this unstructured information application, "*Kosher+Chinese" in the Cuisine reference field will match any Kosher restaurant or any Chinese restaurant. The field score will be 1. (You can see this if you set the weights of the other fields to 0.)
Numbers
ShowTheBest determines whether the field values are numbers or text. If they are numbers, then the field score between two numerical fields is specified in terms of a "closeness ratio" that lies between 0 and 1.

Structured Information

Developers can add implicit information associated with the problem that can improve field and case scoring. For example, developers may want to model geographical closeness, ie, that one neighborhood is closer than another neighborhood. Family relationships (hierarchies) may also need to be modeled. Finally, a developer may need to model preferences. For example, a user might specify a first choice of Indian Cuisineand a second choice of French Cuisine in a reference field. A case database field listing Indian should score higher than a case database field listing French. This ShowTheBest structured information application correctly models such preferences.

In general, field scores based on structured information are "fuzzy" -- the scores range between 0 and 1. Some conventions include:

Field Maps
A field map is a one- or two column table associated with a field. The first column is a text; the optional second column is a number. If the second column is omitted, ShowTheBest induces a set of numbers based on the ordinal sequence of elements. The numbers induce a knowledge-based structure (called an ontology in the AI literature). The text values are closer if their corresponding numerical maps are closer.
Field Map Order
Orders are used to model preferences. This is seen in the structured information application. Cuisine has a Field Map of Order 2.
Formula Maps
Formula maps lets the developer create custom structures. Formulas are arithmetic expressions that evaluate to a number. Formulas can contain constants and "variables." For example, the following formula scores a 1 for those records that have a database value greater or equal to the value that the user types in the Reference field:
```
(data >=ref)
```
For text values, > and < than are interpreted as alphabetic (lexicographic)ordering.
The variables that ShowTheBest uses are:
- data
  the case database value in the particular field
- ref
  the value that the user chooses or selects
For data and values that evaluate to numbers, ShowTheBest has the following variables
- small
  the smallest value in the field (across all records)
- big
  the smallest value in the field (across all records)
- avg
  the average value in the field (across all records)
- width
  the value (big-small) in the field (across all records)
Ordinarily, ShowTheBest generates an Input Box for the user when it encounters a Formula Maps. A more powerful representation occurs when we combine Formula Maps and Field Maps.
Formula-Field Maps
In this case, the developer lets the user decides which formula to use in the field. For example, in this specification...
```
Highest	(data/big) 
Lowest	(small/data) 
```
... ShowTheBest creates a Selection-style choice. When the user selects "Lowest", the field scores computed will be 1 if the data value is actually equal to the lowest value in the field, and progressively smaller numbers if the data value is greater than the lowest value. For example, if the data value in the field is 1000, and the smallest value is 20, then the intermediate score is 0.02 (=20/1000). Note that if your user selects "Highest" than the scores will be different.

Tuning the Case Database

In CBR scoring systems, it is necessary to have as many different cases as possible that “span” the problem space: the cases in the case database should all be “suitably different” from each other. Sometimes, the case scores mask these differences: for example, it is possible that two different cases can have identical scores. You can easily observe this in a case database having only choice type fields: two cases having the same score implies that they have the same degree of similarity to the Reference case, but not necessarily to each other.

Tuning tasks include:

Changing the field importance values (for weighted scores)
Changing the case similarity functions from linear weighted scores, fuzzy logic, Euclidean, cosine, and user scoring formulas.
Modifying one or more cases in the case database.
Adding (or deleting) one or more cases in the case database.
Modifying one or more fields in the case database.
Adding one or more computed fields to the case database, a field that derives its field values by a numerical or logical operation on existing field values.
Adding one or more new fields to the case database.
Deleting one or more fields from the case database.

Sample Problem Sample CBR Application