Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
From | Nick Cox <njcoxstata@gmail.com> |
To | statalist@hsphsun2.harvard.edu |
Subject | Re: st: How to merge datasets when there are missing values in the matching variables |
Date | Sat, 21 Jan 2012 20:03:30 +0000 |
It sounds as if you need to clean up afterwards. I don't see that you can expect -merge- to do the right thing in this circumstance. -duplicates- offers handles for dealing with duplicate observations. Nick On Sat, Jan 21, 2012 at 7:55 PM, shihying yao <berkeley.yao@gmail.com> wrote: > Hi there, > I am trying to merge two data files using two unique ID variables, ID1 > and ID2. Note that not all of the subjects have both ID1 and ID2 > information in both files. Suppose the names of the data files are > "master" and "subset." Below resembles the code I used: > > use subset, clear > sort ID1 ID2 > save subset,replace > > use master, clear > sort ID1 ID2 > merge ID1 ID2 using subset > > The problem occurs for subjects whose ID1 information is missing in > one of the data files (either one). Although these subjects can be > uniquely identified using ID2 in both files, their records are not > merged and there are duplicate records (i.e., one record has both ID1 > and ID2 information, while the other record has ID2 information and > ID1 missing) in the merged file. It doesn't help whether I sort ID1 or > ID2 first, since some subjects have ID2 information in only one file. > > The version I am using is STATA 10. Any help is appreciated. > > Best, > Shihying > * > * For searches and help try: > * http://www.stata.com/help.cgi?search > * http://www.stata.com/support/statalist/faq > * http://www.ats.ucla.edu/stat/stata/ * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/