ShowTheBestCase-Based Reasoning |
and Documentation |
One way to make information more useful is to provide a way to quickly score and sort the data according to any criteria. The criteria should include values and preferences; numbers and text; and structured and unstructured information that is specific to the underlying data.
Static tables could then be transformed into a type of interactive expert system. It turns out that many of these requirements are satisfied by an expert system technology called Case-Based Reasoning.
ShowTheBest is a server-based system that transforms static tables into interactive HTML forms using methods derived from
Case-Based Reasoning (CBR).
ShowTheBest applications include interactive search, scoring and prioritization (here is a
restaurant selection
application) and expert systems (here is an example that selects
multiprocessor computer topologies). ShowTheBest is completely web-based and is implemented as fast C-compiled (not interpreted Perl or PHP) Common Gateway Interface (CGI) programs.
By pressing a button, visitors to a ShowTheBest web page can quickly see the best (or worst) data records that match their requirements.
This concern with modeling "similarity" is what makes CBR
a good source of matching and scoring algorithms.
ShowTheBest is based on many of the CBR methods found in
Induce-It. A case database is a table of records with fields. The current problem proile (or matching target) is another record called the Reference. Each case in the case database is compared field by field with the fields in the Reference. Each field has its own scoring and matching algorithm, specified by the developer.
By convention, field scores are normalized between 0 and 1, with 1 being a perfect field value match with the corresponding Reference field. Case field scores are computed by a weighted average of the field scores. The
ShowTheBest case scores are normalized between 0 and 100, with 100 being a perfect match with the Reference.
Many of the other conventions that ShowTheBest uses can be seen in the
Sample Problem(which shows the Design Tool in operation)
and the
Sample CBR Application.
Here is a description of some other conventions associated with the ShowTheBest field matching algorithms:
In general, field scores based on structured information are "fuzzy" -- the scores range between 0 and 1. Some conventions include:
What ShowTheBest Does
Many web sites provide tables of information -- vendor comparisons, product evaluations, and all sorts of tabular listings. Most of these tables have the undesirable property of "information overload" -- they are too large to digest and require us to spend too much time on them. This is especially true for web sites that are linked to databases -- after a site visitor fills out a form, the "answer" is typically returned as a large table. The only decision-support typically provided are scripts that let users sort records in alphabetical order or by numerical fields like price.
Why Case-Based Reasoning?
Case-Based Reasoning (CBR) is a branch of Artificial Intelligence (AI) that models human problem-solving using a database of cases-- a representation of profiles and solutions. The answer to a problem is essentially the answer to the "most similar problem" that was archived in the case database. Case-based answers are adapted from the closest matching cases, ranked by case score, and displayed to users in a sorted prioritized list.
Unstructured Text and Numbers
Unstructured text and numbers are usually entered into input forms. Some conventions include:
ShowTheBest supports two constants that are used to model different types of "Don’t Care" or "Don't Know"
conditions, " " (one or more blank spaces) and "-" (a single hyphen). A field with value " " matches nothing -- its field score with respect to any input reference field value is zero. A field
with value "-" is matches everything -- its field score with respect to any reference field value is 1. A reference field with input value " " matches nothing -- the field score for all case field values is zero. A reference field
with value "-" matches everything -- the field score for all case field values is 1.
The "wildcard" symbol * matches any sequence of characters in the text field.
The disjunction ("or operator" symbol) + matches individual strings. In this
unstructured information application, "*Kosher+Chinese" in the Cuisine reference field will match any Kosher restaurant or any Chinese restaurant. The field score will be 1. (You can see this if you set the weights of the other fields to 0.)
ShowTheBest determines whether the field values are numbers or text. If they are numbers, then the field score between two numerical fields is specified in terms of a "closeness ratio" that lies between 0 and 1.
Structured Information
Developers can add implicit information associated with the problem that can improve field and case scoring. For example, developers may want to model geographical closeness, ie, that one neighborhood is closer than another neighborhood. Family relationships (hierarchies) may also need to be modeled. Finally, a developer may need to model preferences. For example, a user might specify a first choice of Indian Cuisineand a second choice of French Cuisine in a reference field. A case database field listing Indian should score higher than a case database field listing French.
This ShowTheBest
structured information application
correctly models such preferences.
A field map is a one- or two column table associated with a field. The first column is a text; the optional second column is a number. If the second column is omitted, ShowTheBest induces a set of numbers based on the ordinal sequence of elements. The numbers induce a knowledge-based structure (called an ontology in the AI literature). The text values are closer if their corresponding numerical maps are closer.
Orders are used to model preferences. This is seen in the
structured information application. Cuisine has a Field Map of Order 2.
Formula maps lets the developer create custom structures. Formulas are arithmetic expressions that evaluate to a number. Formulas can contain constants and "variables." For example, the following formula scores a 1 for those records that have a database value greater or equal to the value that the user types in the Reference field:
(data >=ref)
For text values, > and < than are interpreted as alphabetic (lexicographic)ordering.
The variables that ShowTheBest uses are:
For data and values that evaluate to numbers, ShowTheBest has the following variables
Ordinarily, ShowTheBest generates an Input Box for the user when it encounters a Formula Maps. A more powerful representation occurs when we combine Formula Maps and Field Maps.
Highest (data/big) Lowest (small/data)
... ShowTheBest creates a Selection-style choice. When the user selects "Lowest", the field scores computed will be 1 if the data value is actually equal to the lowest value in the field, and progressively smaller numbers if the data value is greater than the lowest value. For example, if the data value in the field is 1000, and the smallest value is 20, then the intermediate score is 0.02 (=20/1000). Note that if your user selects "Highest" than the scores will be different.
Tuning tasks include: