Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: RE: additional observations when merging data sets


From   Katie Farrin <kfarrin@gmail.com>
To   statalist@hsphsun2.harvard.edu
Subject   Re: st: RE: additional observations when merging data sets
Date   Mon, 2 Dec 2013 10:34:29 -0500

Thanks, Joe.  I guess I am just accustomed to using the -joinby-
command, so I will try to use -merge- instead.  I knew -joinby-
created the _merge variable but didn't know -merge- had a better
breakdown (oddly, when I look at the _merge variable created from the
-joinby- command, all of them say that they are just from the master
even though additional observations appear).

I was (unfortunately) not involved in the data collection - I know
someone who was (it's LSMS data from the World Bank and is nationally
representative), but I don't know where he was during survey
collection.

On Mon, Dec 2, 2013 at 10:16 AM, Joe Canner <jcanner1@jhmi.edu> wrote:
> Katie,
>
> That helps; I assumed from your description that you were using the -merge- command.  Is there a reason you chose -joinby- instead of -merge-?  It's hard to say without knowing more about the two data sets, but from your description so far I think -merge- would be more appropriate. -joinby- is more useful if you are trying to do a many-to-many merge, whereas it sounds like you are doing a many-to-one merge (many households matched to one community), which can be done using -merge-.  One advantage of -merge- is that it gives you a complete rundown of how the two data sets match up: how many records matched, how many in just the master, how many in just the using.    It also gives you a new variables (_merge by default) that indicates where the record came from, which will allow you to drop communities that are not represented in your household survey (or vice versa).
>
> Regards,
> Joe
>
> P.S. Were you involved in the data collection for this survey?  Where was it done?  I lived for 4+ years in Blantyre in the mid-90s.  I have fond memories!
>
>
> -----Original Message-----
> From: owner-statalist@hsphsun2.harvard.edu [mailto:owner-statalist@hsphsun2.harvard.edu] On Behalf Of Katie Farrin
> Sent: Monday, December 02, 2013 10:01 AM
> To: statalist@hsphsun2.harvard.edu
> Subject: Re: st: RE: additional observations when merging data sets
>
> Thanks, Joe.  What I mean is that I'm keeping unmatched data from memory (the set I'm working with), but dropping observations from the new file that don't match to observations in my working data set.  So I am keeping the matches from the two data sets but also the ones from the primary data set that don't have matches in the secondary (but not vice versa).
>
> So the code is something like:
>
> use "C:\Users\kmfarrin\Desktop\Malawi Data STATA\data_11_21_13.dta", clear
>
> joinby case_id using "C:\Users\kmfarrin\Desktop\Malawi Data STATA\smooth_ln_dd.dta", unmatched(master)
>
> I have 2056 observations in my working data set.  When I run the command I have 2106. This code is matching household identifiers, but I get the same problem when I'm matching using community identifiers, which are not unique to households.  I can see when I look at the data that I'm getting multiple observations with the same household ID, when I have already cleaned the data so that I only have one observation per household.
>
> On Mon, Dec 2, 2013 at 9:49 AM, Katie Farrin <kfarrin@gmail.com> wrote:
>> Thanks, Joe.  What I mean is that I'm keeping unmatched data from
>> memory (the set I'm working with), but dropping observations from the
>> new file that don't match to observations in my working data set.  So
>> I am keeping the matches from the two data sets but also the ones from
>> the primary data set that don't have matches in the secondary (but not vice versa).
>>
>> So the code is something like:
>>
>> use "C:\Users\kmfarrin\Desktop\Malawi Data STATA\data_11_21_13.dta",
>> clear
>>
>> joinby case_id using "C:\Users\kmfarrin\Desktop\Malawi Data
>> STATA\smooth_ln_dd.dta", unmatched(master)
>>
>> I have 2056 observations in my working data set.  When I run the
>> command I have 2106. This code is matching household identifiers, but
>> I get the same problem when I'm matching using community identifiers,
>> which are not unique to households.
>>
>>
>>
>> On Mon, Dec 2, 2013 at 8:38 AM, Joe Canner <jcanner1@jhmi.edu> wrote:
>>>
>>> Katie,
>>>
>>> Please provide an example of the command(s) you are using, as your
>>> description is confusing and seemingly contradictory ("I select to
>>> keep unmatched data from memory" or " I'm choosing to drop unmatched data"?).
>>>
>>> Thanks,
>>> Joe Canner
>>> Johns Hopkins University School of Medicine
>>>
>>> -----Original Message-----
>>> From: owner-statalist@hsphsun2.harvard.edu
>>> [mailto:owner-statalist@hsphsun2.harvard.edu] On Behalf Of Katie
>>> Farrin
>>> Sent: Monday, December 02, 2013 7:01 AM
>>> To: statalist@hsphsun2.harvard.edu
>>> Subject: st: additional observations when merging data sets
>>>
>>> Good morning,
>>>
>>> I have a general question I was hoping someone would be able to help with.
>>> I have household data that I would like to merge with corresponding
>>> community level data, and am using a pairwise merge by community.  I
>>> select to keep unmatched data from memory, as my sample of households
>>> does not include all communities surveyed.  However, when I merge the
>>> data I get 50 additional observations and I am not sure why this is
>>> happening (my original data set has 2056 observations and the merged
>>> set has 2106).  If I'm choosing to drop unmatched data from the new
>>> file, what is going on to create additional observations?
>>> Each household should have only one associated community ID.
>>>
>>> Thank you!
>>>
>>> Katie
>>> *
>>> *   For searches and help try:
>>> *   http://www.stata.com/help.cgi?search
>>> *   http://www.stata.com/support/faqs/resources/statalist-faq/
>>> *   http://www.ats.ucla.edu/stat/stata/
>>>
>>> *
>>> *   For searches and help try:
>>> *   http://www.stata.com/help.cgi?search
>>> *   http://www.stata.com/support/faqs/resources/statalist-faq/
>>> *   http://www.ats.ucla.edu/stat/stata/
>>
>>
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/faqs/resources/statalist-faq/
> *   http://www.ats.ucla.edu/stat/stata/
>
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/faqs/resources/statalist-faq/
> *   http://www.ats.ucla.edu/stat/stata/

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index