Intelligent Search Technology's
NameSearch® product is a tool which
will enable you to find records using names (personal and corporate),
address and/or other identifying information. This sophisticated software
will
increase the quality of searches while minimizing I/O expense.
The first aspect of Name Search® is intelligent key and range building.
This facility is used for the retrieval of records regardless of variation
caused by phonetics, transcription or keyboarding errors, nicknames,
short forms, missing words, extra words, noise and sequence variations.
Names and addresses suffer from a skewed distribution. A few words represent
the majority of names, while large volumes of uncommon names exist but
occur infrequently. This is most dramatically illustrated by the analysis
of people's names in the United States. While there are 2.5 million last
names and over 3.2 million first names, three hundred surnames represent
thirty-five percent of the population, while over sixty-five percent
of the population has one of four hundred first names. The skew and distribution
of company names and street addresses are just as extreme. Inquiries
will usually possess a similar distribution pattern as the name population
in the database. Complicating the problems of skew and distribution are
the variations due to name frequency characteristics in different geographical
locations and the type of information stored in the database.
Traditional solutions for solving name variations only deal with phonetic
errors. These solutions involved the standardization of easily confused
sounds. For example, PH's would be treated as F's. Elaborate linguistic
rules were generated to phonetically tokenize a name. These phonetically
tokenized words served as the basis for name retrieval. In some instances
these rules helped find names which were hard to spell, unfortunately,
the distribution pattern of common names became even more skewed. For
example, inquiries on John also returned Joan, Jim, Jane, Jimmy, Jenn
and other names which fell in the "JAN" phonetic pattern. By
aggravating the skew in distribution of names both quality and performance
were sacrificed.
Discrepancies caused by phonetic errors account for twenty to twenty
five percent of all name variations. Intelligent Search Technology addresses
problems due to phonetics by employing analysis routines to determine
when phonetic tokenization should be applied. This enables NameSearch® to
overcome problems due to phonetics without the negative consequences
incurred with all other methods of name search.
Many name variations are caused by the use of nicknames. Names like
Bill, William, Bob and Robert are used interchangeably to identify individuals.
NameSearch® uses rule based expertise to solve this class of problems.
The NameSearch® rule base is also used to identify noise words.
Noise words are elements in a name which do not help in the identification
of a candidate. Examples of noise words are Incorporated, Corporation,
Limited, Junior, Senior, Avenue and Street. Often there are times where
elements in a name contribute to the identity but should be treated as
less important. In these cases, the rule base does not treat them as
noise words but recognizes that they are less significant. Some examples
are associate, board, international and services.
The rule base also contains rules for handling common prefixes. Names
like McDaniel are confused with MacDaniel. Prefix recognition provides
the facility for handling these classes of problems.
Another feature of the rule base is diminutive recognition. Frequently
there are names which end in a diminutive such as "ie" or "y".
In these cases, it is useful to identify the root and apply the rule.
For example, you would want Bill, Billie and Billy to find William or
Willie.
NameSearch® comes with an extensive predefined set of rules. These
rules can be used right out of the box or modified to meet your specific
needs. This is done through the NameSearch® Generation Shell.
The Generation Shell is a graphical user interface designed for the
modification and tuning of your NameSearch® subroutines. The Shell
allows you to adjust frequency and rule base tables, set various parameters,
modify key building routines and test changes.
The NameSearch® software, in addition to key building, comes with
advanced comparison functions. These functions use the strength of the
key building routines to intelligently calculate numeric values indicating
the likelihood of a match.
These comparison routines can be used for the elimination of candidates
from an on-line system providing the ability to tailor information being
displayed. This is especially useful for systems containing more than
ten million records. In addition, the comparison routines form the basis
behind batch utilities, such as merge/purge application. These comparison
routines enable systems to make decisions without human intervention.
NameSearch® integrates various strands of knowledge to form a cohesive
fabric enabling successful retrieval of records based on a name and/or
addresses. By incorporating rules on common prefixes, suffixes, nicknames,
noise words and other similar classes of variations, combined with Intelligent
Search Technology's phonetics mechanism and it's user friendly Generation
Shell, the complexities of NameSearch® are made easy.
|