Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: merging datasets


From   "Michael N. Mitchell" <[email protected]>
To   [email protected]
Subject   Re: st: merging datasets
Date   Tue, 09 Nov 2010 14:55:05 -0800

Dear Mike

I don't know if this is helpful, but when I have encountered this kind of data, this is the strategy that I have used. Call the two databases A and B. I would start by matching A and B based on all of the variables (first, middle, last, dob, ssn). Some observations will match on all criteria. Call those matches observations that met matched based on criteria one. Take the remaining unmatched observations and then try and match them on a looser criteria, for example everything but middle name. Call those matches criteria two matches. Take the unmatched observations and try matching again on a looser criteria. Repeat this process continuing to loosen up the matching criteria. At the end, I might be matching based on a criteria that is too loose for my comfort (such as, last name only). You can then do a frequency count, among the matching records, of how many matched at each criteria level (including the criteria that is too loose for comfort). You can then weigh the number of matches against the criteria to decide the optimal balance between matches and quality of the match criteria.

I hope this helps,

Michael N. Mitchell
Data Management Using Stata      - http://www.stata.com/bookstore/dmus.html
A Visual Guide to Stata Graphics - http://www.stata.com/bookstore/vgsg.html
Stata tidbit of the week         - http://www.MichaelNormanMitchell.com



On 2010-11-08 8.50 PM, Michael Eisenberg wrote:
Colleagues,

I have a database of about 20K men that I'd like to merge with another
database.  I have names (first, middle, and last) as well as date of
birth and social security number for most men.  Unfortunately, the
original database has some missing data on birthdate and social
security numbers.  The new database has most of the birthdate info as
well as the geographic information that I need.

Some men do have the same name.

Is there anyway to merge based on name if it doesn't uniquely identify
men?  I'd like to somehow match all men and then let me manually
compare based on visit dates to decide if it's likely the match is
correct.  If not, any suggestions?

Thanks for you help.

Mike
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index