Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

st: Merging datasets using non-identical addresses/strings as identifiers

From   Benjamin Niug <>
Subject   st: Merging datasets using non-identical addresses/strings as identifiers
Date   Sun, 5 Feb 2012 12:48:16 +0100

Hi folks,

I am having a specific merging question. I want to merge two datasets
that use addresses as the identifiers of the observations. However,
these addresses differ marginally - that is why I cannot use the
simple -merge- command. They might differ marginally regarding their
spelling (there are many systematic differences (e.g. bulevard instead
of boulevard) but also non-systematic ones e.g. simple spelling
mistakes) besides I want to merge addresses that can differ w.r.t. the
house number.

A stylized example (notice different spelling):
11 Sunset Boulevard, Tirana, Albania

to be merged with

13 Sunset Bulevard, Tirane

So far, I tried to tackle this problem using regular expressions  -
but it does not work very well at all (as you typically only deal with
systematic differences). Does anybody have a suggestion for a
procedure that I could use for this problem?

Thanks in advance!
*   For searches and help try:

© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index