Bookmark and Share

Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down on April 23, and its replacement, statalist.org is already up and running.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

st: Merging datasets using non-identical addresses/strings as identifiers


From   Benjamin Niug <benjamin.niug@googlemail.com>
To   statalist@hsphsun2.harvard.edu
Subject   st: Merging datasets using non-identical addresses/strings as identifiers
Date   Sun, 5 Feb 2012 12:48:16 +0100

Hi folks,

I am having a specific merging question. I want to merge two datasets
that use addresses as the identifiers of the observations. However,
these addresses differ marginally - that is why I cannot use the
simple -merge- command. They might differ marginally regarding their
spelling (there are many systematic differences (e.g. bulevard instead
of boulevard) but also non-systematic ones e.g. simple spelling
mistakes) besides I want to merge addresses that can differ w.r.t. the
house number.

A stylized example (notice different spelling):
11 Sunset Boulevard, Tirana, Albania

to be merged with

13 Sunset Bulevard, Tirane

So far, I tried to tackle this problem using regular expressions  -
but it does not work very well at all (as you typically only deal with
systematic differences). Does anybody have a suggestion for a
procedure that I could use for this problem?

Thanks in advance!
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   Site index