Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: st: RE: additional observations when merging data sets


From   Joe Canner <[email protected]>
To   "[email protected]" <[email protected]>
Subject   RE: st: RE: additional observations when merging data sets
Date   Mon, 2 Dec 2013 15:16:49 +0000

Katie,

That helps; I assumed from your description that you were using the -merge- command.  Is there a reason you chose -joinby- instead of -merge-?  It's hard to say without knowing more about the two data sets, but from your description so far I think -merge- would be more appropriate. -joinby- is more useful if you are trying to do a many-to-many merge, whereas it sounds like you are doing a many-to-one merge (many households matched to one community), which can be done using -merge-.  One advantage of -merge- is that it gives you a complete rundown of how the two data sets match up: how many records matched, how many in just the master, how many in just the using.    It also gives you a new variables (_merge by default) that indicates where the record came from, which will allow you to drop communities that are not represented in your household survey (or vice versa).

Regards,
Joe

P.S. Were you involved in the data collection for this survey?  Where was it done?  I lived for 4+ years in Blantyre in the mid-90s.  I have fond memories!


-----Original Message-----
From: [email protected] [mailto:[email protected]] On Behalf Of Katie Farrin
Sent: Monday, December 02, 2013 10:01 AM
To: [email protected]
Subject: Re: st: RE: additional observations when merging data sets

Thanks, Joe.  What I mean is that I'm keeping unmatched data from memory (the set I'm working with), but dropping observations from the new file that don't match to observations in my working data set.  So I am keeping the matches from the two data sets but also the ones from the primary data set that don't have matches in the secondary (but not vice versa).

So the code is something like:

use "C:\Users\kmfarrin\Desktop\Malawi Data STATA\data_11_21_13.dta", clear

joinby case_id using "C:\Users\kmfarrin\Desktop\Malawi Data STATA\smooth_ln_dd.dta", unmatched(master)

I have 2056 observations in my working data set.  When I run the command I have 2106. This code is matching household identifiers, but I get the same problem when I'm matching using community identifiers, which are not unique to households.  I can see when I look at the data that I'm getting multiple observations with the same household ID, when I have already cleaned the data so that I only have one observation per household.

On Mon, Dec 2, 2013 at 9:49 AM, Katie Farrin <[email protected]> wrote:
> Thanks, Joe.  What I mean is that I'm keeping unmatched data from 
> memory (the set I'm working with), but dropping observations from the 
> new file that don't match to observations in my working data set.  So 
> I am keeping the matches from the two data sets but also the ones from 
> the primary data set that don't have matches in the secondary (but not vice versa).
>
> So the code is something like:
>
> use "C:\Users\kmfarrin\Desktop\Malawi Data STATA\data_11_21_13.dta", 
> clear
>
> joinby case_id using "C:\Users\kmfarrin\Desktop\Malawi Data 
> STATA\smooth_ln_dd.dta", unmatched(master)
>
> I have 2056 observations in my working data set.  When I run the 
> command I have 2106. This code is matching household identifiers, but 
> I get the same problem when I'm matching using community identifiers, 
> which are not unique to households.
>
>
>
> On Mon, Dec 2, 2013 at 8:38 AM, Joe Canner <[email protected]> wrote:
>>
>> Katie,
>>
>> Please provide an example of the command(s) you are using, as your 
>> description is confusing and seemingly contradictory ("I select to 
>> keep unmatched data from memory" or " I'm choosing to drop unmatched data"?).
>>
>> Thanks,
>> Joe Canner
>> Johns Hopkins University School of Medicine
>>
>> -----Original Message-----
>> From: [email protected]
>> [mailto:[email protected]] On Behalf Of Katie 
>> Farrin
>> Sent: Monday, December 02, 2013 7:01 AM
>> To: [email protected]
>> Subject: st: additional observations when merging data sets
>>
>> Good morning,
>>
>> I have a general question I was hoping someone would be able to help with.
>> I have household data that I would like to merge with corresponding 
>> community level data, and am using a pairwise merge by community.  I 
>> select to keep unmatched data from memory, as my sample of households 
>> does not include all communities surveyed.  However, when I merge the 
>> data I get 50 additional observations and I am not sure why this is 
>> happening (my original data set has 2056 observations and the merged 
>> set has 2106).  If I'm choosing to drop unmatched data from the new 
>> file, what is going on to create additional observations?
>> Each household should have only one associated community ID.
>>
>> Thank you!
>>
>> Katie
>> *
>> *   For searches and help try:
>> *   http://www.stata.com/help.cgi?search
>> *   http://www.stata.com/support/faqs/resources/statalist-faq/
>> *   http://www.ats.ucla.edu/stat/stata/
>>
>> *
>> *   For searches and help try:
>> *   http://www.stata.com/help.cgi?search
>> *   http://www.stata.com/support/faqs/resources/statalist-faq/
>> *   http://www.ats.ucla.edu/stat/stata/
>
>
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index