De-duplication

Romanian address databases present a real challenge for de-duplication because of the large number of name changes. SMARTaddress, developed by Geo Strategies delivers clean, de-duplicated addresses in a variety of formats.

Move to a single customer view (SCV)

In principle, de-duplication is very simple and most databases have facilities to identify (and remove) duplicates. However, this becomes very difficult when there are spelling mistakes, random abbreviations and similar.

This problem is normally dealt with by fuzzy matching (similar to spell checking) to correct small mistakes but Romanian addresses are sufficiently complicated to invalidate most ‘normal’ approaches.

Geo Strategies has created a system within SMARTaddress to identify duplicate addresses for potential removal.

Typical examples of addresses which normal systems might not identify as being identical (but which SMARTaddress recognises as being identical) are as follows:

Example 1 De-duplication 1

This is a relatively simple example of abbreviations that might not be identified by a “normal” system. In just one database we encountered 54 variations of the spelling of BucureÅŸti.

Example 2 De-duplication 2

Here there has been a name change: Geo Strategies are aware of over 6,000 situations of this type which have been included in corporate databases.

 Example 3        This illustrates an altogether more difficult problem:

De-duplication 3

 

Clearly, these are different addresses but without sophisticated recognition and data extraction routines, then these two addresses might easily have been (incorrectly) been identified as being identical i.e. a duplicate.

If your database has more than 50,000 records, or you want to match across multiple files, then Geo Strategies recommend you contact us for a trial evaluation.