Intelligent Search Technology, Ltd. specializes in search and matching software.  Name Search our flagship product provides intelligence to both online and batch search and matching applications.  Name Search not only enables systems to find and match information based on personal and corporate names but also comes with powerful address searching and e-mail searching services.  Correct Address is address verification, validation and correction software harnesses the intelligence of the Name Search.  Name search also powers ISTwatch.  ISTwatch is terrorist checking software to enabling compliance with US patriots act.   Merlin Merge supplied with the name search is used for duplicate record identification and merge purge operations. The Intelligent Choice
HOME  |  PRODUCTS  |  SERVICES  |  CUSTOMERS  |  NEWS |  ABOUT IST  |  MY ACCOUNT
MerlinMerge® SpeedPro
 


MerlinMerge® SpeedPro
» Overview
» Features
» Benefits
» Use of software
» Intelligence
» FAQ
» Web service
» Buy now


Product Demo
» Software demo
» Screenshots
» Download trial
» Free online test
» Personal demo

Technical Information
» System requirements
» Technical support
» Online manuals






 
 

MerlinMerge® SpeedPro Software Intelligence

MerlinMerge® SpeedPro uses the searching and matching intelligence of the extremely powerful NameSearch® technology to achieve unparalleled accuracy and speed while overcoming variations due to misspellings, transcriptions, transpositions, acronyms, phonetics, sequence differences, nicknames and many other common errors found in data.

Spelling or keyboard errors
Rulebase expertise
Phonetic errors
Missing, extra, noise words
Word sequence variations
Acronym recognition


Spelling Errors


Spelling and keyboard errors account for many of the duplicates to be found in a database.
Through the use of intelligent key building and advanced comparison routines MerlinMerge® SpeedPro successfully overcomes spelling errors including: multiple typos, letter transpositions, incomplete words, etc.


Rulebase expertise

The rulebase expert system is used to identify nicknames. Entities such as Bill, William, Bob and Robert are often used interchangeably to identify individuals. The rulebase is also used to identify noise words. Noise words are elements in a name that do not help in the identification of a candidate. Examples of noise words are: Incorporated, Corporation, Limited, Junior, Senior, Avenue and Street. Often there are times where elements in a name contribute to the identity but should be treated as less important. In these cases, the rulebase does not treat them as noise words but recognizes that they are less significant. Some examples are: associate, board, international and services. Other variations are caused by the use of common prefixes. Names like McDonnell, are confused with MacDonnell. Prefix recognition provides the facility for handling these classes of problems. The rulebase can also recognize diminutives. Frequently there are names which end in a diminutive such as "ie" or "y". In these cases, it is useful to identify the root and apply the rule. For example, you would want Bill, Billie and Billy to find William or Willie.


BILL YARA WILLIAM YARA
BOBBY KENNEDY ROBERT KENNEDY
JIM P PHILLIPS SR JAMES P PHILLIPS
SMITH AND ASSOCIATES SMITH
MCDONELL CORPORATION MCDONELL
MR MATT J THOMAS MATTHEW J THOMAS
MARINA DELSOLE MARINA DEL SOLE
DR LEONARD MACCOY MD LEONARD MCCOY


Phonetic Errors

Discrepancies caused by phonetic errors account for 20-25% of all name variations.

Traditional solutions such as Soundex and NYSIIS used for solving name variations only deal with phonetic errors. These solutions involve the standardization of easily confused sounds. For example, PH's would be treated as F's. Linguistic rules are generated to phonetically tokenize a name. These phonetically tokenized words serve as the basis for name retrieval. In some instances these rules help to find names that are difficult to spell. Unfortunately, the distribution pattern of common names becomes even more skewed. For example, inquiries on John also return Joan, Jim, Jane, Jimmy, Jenn and other names which fall in the "JAN" phonetic pattern. By aggravating the skew in distribution of names both quality and performance are sacrificed.

MerlinMerge® SpeedPro addresses problems due to phonetics by employing analysis routines to determine the extent of phonetic tokenization.
This enables MerlinMerge® SpeedPro to overcome problems due to phonetics without the negative consequences incurred with all other methods of name search.

Examples of phonetic tokenization: (taken directly from Robert L. Taft, "Name Search Techniques", New York State Identification and Intelligence):

1) Translate first characters of name
  MAC => MCC
  PH => FF
  KN => NN
  K => C
  SCH => SSS
2) Translate last characters of name
  EE => Y
  IE => Y
  DT,RT,RD,NT,ND => D
3) First character of key = first character of name
4) Translate remaining characters by following rules, incrementing by one character each time
  EV => AF else A,E,I,O,U => A
  Q => G
  Z => S
  M => N
  KN => N else K => C
  SCH => SSS
  PH => FF
  H => If previous or next is non vowel, previous
  W => If previous is vowel, previous
5) Translate last characters of name
  If last character is S, remove it
  If last characters are AY, replace with Y
  If last character is A, remove it


Missing, extra, noise words

The rulebase is used to identify noise words. Noise words are elements in a name that do not help in the identification of a candidate. Examples of noise words are: Incorporated, Corporation, Limited, Junior, Senior, Avenue and Street.
While processing the data, MerlinMerge® SpeedPro goes through a process called sanitization that removes noise characters, extra spaces, control characters and converts lower case letters to uppercase. Examples of noise characters are: @, #. $, %, ^, &, *, (, ), }, {, [, ]. The following characters are handled separately and have special meanings: commas, hyphens and quotes. Commas usually indicate the insertion of a last name. Sanitization places words followed by commas at the end of the string. Quotes are deleted and the space between them is removed. A space replaces the hyphens.

Before Sanitization After Sanitization
Scott Lions SCOTT LIONS
Smith, John F. JOHN F SMITH
Rose Stone-Shield ROSE STONE SHIELD
James O'Tool JAMES OTOOL
James O. Tool JAMES OTOOL
Owen, Tool, James JAMES OWEN TOOL
# Williams , $Richard RICHARD WILLIAMS

The sanitization process also uses a small rulebase. The rulebase is applied after all the alpha characters have been converted to upper case letters and extra blanks are removed. This rulebase is used to recognize words that contain noise characters or prefixes that could be effected by the sanitization process.

Before Sanitization After Sanitization Sanitization (without rulebase expertise)
c\o CARE OF C O
Mc Donald, Old OLD MCDONALD MC OLD DONALD
% CARE OF  


Word Sequence Variations

Many search problems are caused by sequence variations. The inability to determine the order of words for a particular entity occurs at both data entry and inquiry time. The name Frank Lee for example, could have been Lee Frank. This problem is particularly pervasive in company names. Names such as International Business Machines, Anderson Consulting and Kemper Insurance Company are examples where the left-most word is most significant. Conversely, Edward S. Gordan Real Estate Company and Paul Mitchell Hair Products are examples where the left-most word is less significant. The inability to predict the significant name with respect to word position causes many searches to fail.

Merging foreign database files causes other sequence variations. This frequently occurs when external lists are purchased or companies consolidate information. Inconsistent methodologies for data capture make the standardization of name fields impossible. Aggravating the sequence problem are those instances in which company names are intermixed with personal names. All of these factors, in addition to human error, contribute to identification problems caused by sequence variations. MerlinMerge® SpeedPro provides a facility for handling these problems.

To understand this better we will draw an analogy between a telephone book and a database system. When we look for Frank Lee we search the "L" section. If the name is not there, we continue the search by looking in the "F" section. In order to find Frank Lee we had to search two separate sections of the phone book. Suppose we were looking for Frank Lee Ray. To ensure success we must search all the permutations. This is an extremely arduous and time consuming process for both people and computers. By listing Frank Lee in both the L and F sections, regardless of order, only one section would need to be searched.
Using this approach, MerlinMerge® SpeedPro
is able to overcome word sequence variations without sacrificing performance.


Acronym Recognition

Corporate name searching concretely illustrates the pragmatic difficulties in developing solutions that find correct information without missing likely candidates. People readily understand the similarities between "Triple A towing" and "AAA towing" yet computerized systems would need to employ a knowledge-based algorithm to recognize the relationship between Triple A and AAA.
The deployment of intelligence through knowledge based systems greatly benefits search and matching algorithms by identifying nicknames, shortened forms, noise words and other circumstances that require experience to return a more comprehensive result set. However, knowledge-based systems are limited by the breadth and depth of their lexicon. Contrary to names such as IBM and AT&T, the vast majority of acronyms lie outside the scope of knowledge-base processing. For example, our clients often used the IST acronym interchangeably with Intelligent Search Technology yet it would be unreasonable to expect the inclusion of IST in a knowledge-based system.
The MerlinMerge® SpeedPro software with its corporate search algorithms and acronym recognition functionality significantly advances the ability to seek and match corporate name data.





    Home |  Privacy  |  Legal  |  Partners  |  Contact  |  Support

To find out more, call (800) 287-0412
Copyright © 1993-2007 Intelligent Search Technology Ltd.
IBM Business Partner emblem is a registered trademark of IBM Corporation.
Microsoft is a registered trademark of Microsoft Corporation.
Oracle is a registered trademark of Oracle Corporation.