Deduplication or deduping,
involves identifying records in a database table, file or
other data source that contain information about the same individual or entity. It is fairly easy
to identify duplicates when dealing with exact matches however in most cases the duplicate
record can be different from the original. Advanced fuzzy matching technology
is essential for
properly deduping a database and for ensuring data quality.
Every data quality initiative should involve procedures where by duplicate
records are not enterred
into the system and any duplicate records already in the system are
periodically purged.
Fuzzy Grouping and Fuzzy Lookup is not Enough
IST's fuzzy matching engine is easier to use and provides better results than
the fuzzy grouping
and fuzzy lookup transformations provided with SSIS.
Our SSIS deduplication component provides extremely powerful and accurate data deduplication
functionality. At its core is IST's powerful searching and matching technology,
which allows it to
outperform the competition. How? - By using sophisticated techniques, such as “fuzzy” matching,
heuristic algorithms, phonetic analysis, and much more. The sophisticated techniques at the heart
of IST SSIS deduplication are hidden from the end-user through an easy-to-use, point-and-click
graphical user interface. With only a few mouse clicks, users can send their
deduping jobs to the
matching engine and receive processed results in a specified format. Combining advanced deduping
with SSIS built-in extensibility and data integration features provides a powerful toolset for every
data steward or data warehousing specialist.
Our SSIS deduplication component uses a single data sources as input and finds the duplicates
within the data source. If you want to find duplicates across two data sources you would need to
use our SSIS merge/purge component. If you want to dedupe or find matches across multiple data
sources you can chain multiple SSIS merge/purge components within the SSIS work pane.
.: To find out more about IST technology click here.


