Intelligent Search Technology, Ltd. specializes in search and matching software.  Name Search our flagship product provides intelligence to both online and batch search and matching applications.  Name Search not only enables systems to find and match information based on personal and corporate names but also comes with powerful address searching and e-mail searching services.  Correct Address is address verification, validation and correction software harnesses the intelligence of the Name Search.  Name search also powers ISTwatch.  ISTwatch is terrorist checking software to enabling compliance with US patriots act.   Merlin Merge supplied with the name search is used for duplicate record identification and merge purge operations. The Intelligent Choice 
HOME  |  PRODUCTS  |  SERVICES  |  CUSTOMERS  |  NEWS |  ABOUT IST  |  MY ACCOUNT
NameSearch®
 


NameSearch®
» Overview
» Features
» Intelligence
» Architecture
» Integration
» Applications
» Company Name Search
» FAQ
» White Paper


Product Demo
» Free trial
» Personal demo

Technical Information
» System requirements
» SDKs
» Technical support








 
 

NameSearch® Software Intelligence

NameSearch® uses searching and matching intelligence to achieve unparalleled accuracy and speed while overcoming variations due to misspellings, transcriptions, transpositions, acronyms, phonetics, sequence differences, nicknames and many other common errors found in data.

Spelling or keyboard errors
Rulebase expertise
Phonetic errors
Missing, extra, noise words
Word sequence variations
Acronym recognition



Spelling Errors


Spelling and keyboard errors account for many of the variations in a database.
Through the use of intelligent key building and advanced comparison routines NameSearch® successfully overcomes spelling errors including: multiple typos, letter transpositions, incomplete words, etc.

Variations due to spelling only, matched and scored by NameSearch®:

Input: Richard Wagner
Rihcard Wagne 097
Ricard Waner 097
Richart Wagnar 088
Richart Wagnar 088
Rickard Wackner 085
Ritchart Wagma 080


Missing, extra, noise words

The rulebase is used to identify noise words. Noise words are elements in a name that do not help in the identification of a candidate. Examples of noise words are: Incorporated, Corporation, Limited, Junior, Senior, Avenue and Street.
While processing the data NameSearch® goes through a process called sanitization that removes noise characters, extra spaces, control characters and converts lower case letters to uppercase. Examples of noise characters are: @, #. $, %, ^, &, *, (, ), }, {, [, ]. The following characters are handled separately and have special meanings: commas, hyphens and quotes. Commas usually indicate the insertion of a last name. Sanitization places words followed by commas at the end of the string. Quotes are deleted and the space between them is removed. A space replaces the hyphens.

Before Sanitization After Sanitization
Scott Lions SCOTT LIONS
Smith, John F. JOHN F SMITH
Rose Stone-Shield ROSE STONE SHIELD
James O'Tool JAMES OTOOL
James O. Tool JAMES OTOOL
Owen, Tool, James JAMES OWEN TOOL
# Williams , $Richard RICHARD WILLIAMS

The sanitization process also uses a small rulebase. The rulebase is applied after all the alpha characters have been converted to upper case letters and extra blanks are removed. This rulebase is used to recognize words that contain noise characters or prefixes that could be effected by the sanitization process.

Before Sanitization After Sanitization Sanitization (without rulebase expertise)
c\o CARE OF C O
Mc Donald, Old OLD MCDONALD MC OLD DONALD
% CARE OF  



Rulebase expertise

The rulebase expert system is used to identify nicknames. Entities such as Bill, William, Bob and Robert are used interchangeably to identify individuals. The rulebase is also used to identify noise words. Noise words are elements in a name that do not help in the identification of a candidate. Examples of noise words are: Incorporated, Corporation, Limited, Junior, Senior, Avenue and Street. Often there are times where elements in a name contribute to the identity but should be treated as less important. In these cases, the rulebase does not treat them as noise words but recognizes that they are less significant. Some examples are: associate, board, international and services. Other variations are caused by the use of common prefixes. Names like McDonnell, are confused with MacDonnell. Prefix recognition provides the facility for handling these classes of problems. The rulebase can also recognize diminutives. Frequently there are names which end in a diminutive such as "ie" or "y". In these cases, it is useful to identify the root and apply the rule. For example, you would want Bill, Billie and Billy to find William or Willie.

BILL YARA WILLIAM YARA
BOBBY KENNEDY ROBERT KENNEDY
JIM P PHILLIPS SR JAMES P PHILLIPS
SMITH AND ASSOCIATES SMITH
MCDONELL CORPORATION MCDONELL
MR MATT J THOMAS MATTHEW J THOMAS
MARINA DELSOLE MARINA DEL SOLE
DR LEONARD MACCOY MD LEONARD MCCOY

For example, the personal name rulebase helps match Robert, Rob and Bobby. The street service gives less significance to the words Road, Avenue and Blvd in the address. The company names service ignores the words Corporation, Inc. and Corp:

Name Address Company
Robert Wagner 24 Milltown Road Smith Corporation
Rob Wagner 24 Milltown Avenue Smith Inc.
Bobby Wagner 24 Milltown Blvd Smith Corp Inc.


Phonetic Errors

Discrepancies caused by phonetic errors account for 20-25% of all name variations.

Traditional solutions such as Soundex and NYSIIS used for solving name variations only deal with phonetic errors. These solutions involved the standardization of easily confused sounds. For example, PH's would be treated as F's. Linguistic rules were generated to phonetically tokenize a name. These phonetically tokenized words served as the basis for name retrieval. In some instances these rules helped find names that were hard to spell, unfortunately, the distribution pattern of common names became even more skewed. For example, inquiries on John also returned Joan, Jim, Jane, Jimmy, Jenn and other names which fell in the "JAN" phonetic pattern. By aggravating the skew in distribution of names both quality and performance were sacrificed.

NameSearch® addresses problems due to phonetics by employing analysis routines to determine the extent of phonetic tokenization.
This enables NameSearch® to overcome problems due to phonetics without the negative consequences incurred with all other methods of name search.
For example, the following variations are caught with the help of phonetics:

Name
Phillip Mac Affik
Filip Mkafic
Philip Mackaphik

Examples of phonetic tokenization: (taken directly from Robert L. Taft, "Name Search Techniques", New York State Identification and Intelligence):

1) Translate first characters of name
  MAC => MCC
  PH => FF
  KN => NN
  K => C
  SCH => SSS
2) Translate last characters of name
  EE => Y
  IE => Y
  DT,RT,RD,NT,ND => D
3) First character of key = first character of name
4) Translate remaining characters by following rules, incrementing by one character each time
  EV => AF else A,E,I,O,U => A
  Q => G
  Z => S
  M => N
  KN => N else K => C
  SCH => SSS
  PH => FF
  H => If previous or next is non vowel, previous
  W => If previous is vowel, previous
5) Translate last characters of name
  If last character is S, remove it
  If last characters are AY, replace with Y
  If last character is A, remove it


Word Sequence Variations

Many search problems are caused by sequence variations. The inability to determine the order of words for a particular entity occurs at both data entry and inquiry time. The name Frank Lee for example, could have been Lee Frank. This problem is particularly pervasive in company names. Names such as International Business Machines, Anderson Consulting and Kemper Insurance Company are examples where the left-most word is most significant. Conversely, Edward S. Gordan Real Estate Company and Paul Mitchell hair products are examples where the left-most word is less significant. The inability to predict the significant name with respect to word position causes many searches to fail.


For example, the different permutations of the words in the input are matched:

Name Address Company
Ricky Scott Wagner 24 West Jones Avenue Jones and Smith Corporation
Scott Rick Wagner 24 Jones Avenue West Smith and Jones Corporation
Wagner Rick Scortt 24 Avenue Jones West Corporation Jones Smith

Merging foreign database files causes other sequence variations. This frequently occurs when external lists are purchased or companies consolidate information. Inconsistent methodologies for data capture make the standardization of name fields impossible. Aggravating the sequence problem are those instances in which company names are intermixed with personal names. All of these factors, in addition to human error, contribute to identification problems caused by sequence variations. NameSearch® provides a facility for handling these problems.

To understand this better we will draw an analogy between a telephone book and a database system. When we look for Frank Lee we search the "L" section. If the name is not there, we continue the search by looking in the "F" section. In order to find Frank Lee we had to search two separate sections of the phone book. Suppose we were looking for Frank Lee Ray. To ensure success we must search all the permutations. This is an extremely arduous and time consuming process for both people and computers. By listing Frank Lee in both the L and F sections, regardless of order, only one section would need to be searched.


Acronym Recognition

Corporate name searching concretely illustrates the pragmatic difficulties in developing solutions that find correct information without missing likely candidates. People readily understand the similarities between "Triple A towing" and "AAA towing" yet computerized systems would need to employ a knowledge based algorithm to recognize the relationship between Triple A and AAA.

For example, IST is recognized as an acronym for Intelligent Search Technology:

Company
Intelligent Search Technology
IST
IS Technology



The deployment of intelligence through knowledge based systems greatly benefits search and matching algorithms by identifying nicknames, shortened forms, noise words and other circumstances that require experience to return a more comprehensive result set. However, knowledge based systems are limited by the breadth and depth of their lexicon. Contrary to names such as IBM and AT&T, the vast majority of acronyms lie outside the scope of knowledge base processing. For example, our clients often used the IST acronym interchangeably with Intelligent Search Technology yet it would be unreasonable to expect the inclusion of IST in a knowledge based system.
The NameSearch® software with its corporate search algorithms and acronym recognition functionality significantly advances the ability to seek and match corporate name data.





    Home |  Privacy  |  Legal  |  Partners  |  Contact  |  Support

To find out more, call (800) 287-0412
Copyright © 1993-2006 Intelligent Search Technology Ltd.
IBM Business Partner emblem is a registered trademark of IBM Corporation.
Microsoft is a registered trademark of Microsoft Corporation.
'Java and all Java-based marks', Sun and Solaris are trademarks or registered trademarks of
Sun Microsystems, Inc. in the United States and other countries.
Oracle is a registered trademark of Oracle Corporation.