Statalist


[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

st: Question about match merge


From   Scott Talkington <talkings@gmu.edu>
To   statalist@hsphsun2.harvard.edu
Subject   st: Question about match merge
Date   Sun, 02 Sep 2007 09:39:05 -0400

I seem to recall that there's an algorithm that is able to crosswalk 
databases by matching names combined with other secondary keys, such as 
zip code, and that the algorithm will produce a "probability of match" 
for the given ID.  I used to conduct match merges based on name and zip 
in an earlier version of Stata, but it was quite cumbersome to deal with 
misspellings, typos (common transpositions of letters or numbers, etc.), 
all caps vs lower case, prefixes and suffixes, titles, middle initial 
versus middle name, etc, etc..  What I'd like to know is whether a more 
sophisticated match/merge based on primary and secondary keys or IDs has 
been developed, and if so some documentation on how it works.  Also, 
would it deal with very common names, such as "David Jones" vs less 
common names, like "Horace Vilochkek" or size of the database,  adjust 
the probability of match accordingly.  Or is all of this just some pipe 
dream I happend to think up when I was under the influence?

I'll also try to scrounge up something on the FAQ database, but most of 
my text documentation on Stata 9.2 is stored in boxes since I'm in the 
midst of a move, and I need at least some idea of the capability of such 
a match/merge within the week.

Scott Talkington, PhD
talkings@gmu.edu


*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index