| |
How
NameSearch® Works - Sanitization
The sanitization module removes noise characters, extra
spaces, control characters and converts lower case letters to
uppercase. Examples of noise characters are: @, #. $, %, ^, &, *,
(, ), }, {, [, ]. The following characters are handled separately
and have special
meanings: commas, hyphens and quotes. Commas usually indicate
the insertion of a last name. Sanitization places words followed
by commas at the end
of the string. Quotes are deleted and the space between them
is removed. A space replaces the hyphens.
Examples of Sanitization:
| Before Sanitization |
After Sanitization |
| Scott Lions |
SCOTT LIONS |
| Smith, John F. |
JOHN F SMITH |
| Rose Stone-Shield |
ROSE STONE SHIELD |
| James O'Tool |
JAMES OTOOL |
| James O. Tool |
JAMES OTOOL |
| Owen, Tool, James |
JAMES OWEN TOOL |
| # Williams, $Richard |
RICHARD WILLIAMS |
The sanitization module also contains a small rulebase.
The rulebase is applied after all the alpha characters have been
converted to upper case letters and extra blanks are removed. This rulebase
is
used to recognize words that contain noise characters or prefixes
that could be effected by the sanitization process. The sanitization
rulebase
also gives you the ability to convert non-alpha-numeric characters
to other symbols or words. The First Word rule type was designed for
commercial
name searches where a word in the first position of a name would
be considered noise. There are times when a word in the middle of a commercial
or cooperate
name would help contribute to the identification of a record but
the same word found in the first position would obscure the search. Classifying
noise words based on position could effect NameSearch®’s ability
to overcome sequence variations. The application of this rule should
be used judiciously and with great thought. The sanitization rulebase
can be easily modified using the NameSearch® Graphical User Interface,
the "Generation Shell."
| Before Sanitization |
After Sanitization |
Sanitization (without rulebase expertise) |
| c\o |
CARE OF |
C O |
| Mc Donald, Old |
OLD MCDONALD |
MC OLD DONALD |
| % |
CARE OF |
|
How
NameSearch® works
|