Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: RE: additional observations when merging data sets

From   Katie Farrin <[email protected]>
To   [email protected]
Subject   Re: st: RE: additional observations when merging data sets
Date   Mon, 2 Dec 2013 10:01:19 -0500

Thanks, Joe.  What I mean is that I'm keeping unmatched data from
memory (the set I'm working with), but dropping observations from the
new file that don't match to observations in my working data set.  So
I am keeping the matches from the two data sets but also the ones from
the primary data set that don't have matches in the secondary (but not
vice versa).

So the code is something like:

use "C:\Users\kmfarrin\Desktop\Malawi Data STATA\data_11_21_13.dta", clear

joinby case_id using "C:\Users\kmfarrin\Desktop\Malawi Data
STATA\smooth_ln_dd.dta", unmatched(master)

I have 2056 observations in my working data set.  When I run the
command I have 2106. This code is matching household identifiers, but
I get the same problem when I'm matching using community identifiers,
which are not unique to households.  I can see when I look at the data
that I'm getting multiple observations with the same household ID,
when I have already cleaned the data so that I only have one
observation per household.

On Mon, Dec 2, 2013 at 9:49 AM, Katie Farrin <[email protected]> wrote:
> Thanks, Joe.  What I mean is that I'm keeping unmatched data from memory
> (the set I'm working with), but dropping observations from the new file that
> don't match to observations in my working data set.  So I am keeping the
> matches from the two data sets but also the ones from the primary data set
> that don't have matches in the secondary (but not vice versa).
> So the code is something like:
> use "C:\Users\kmfarrin\Desktop\Malawi Data STATA\data_11_21_13.dta", clear
> joinby case_id using "C:\Users\kmfarrin\Desktop\Malawi Data
> STATA\smooth_ln_dd.dta", unmatched(master)
> I have 2056 observations in my working data set.  When I run the command I
> have 2106. This code is matching household identifiers, but I get the same
> problem when I'm matching using community identifiers, which are not unique
> to households.
> On Mon, Dec 2, 2013 at 8:38 AM, Joe Canner <[email protected]> wrote:
>> Katie,
>> Please provide an example of the command(s) you are using, as your
>> description is confusing and seemingly contradictory ("I select to keep
>> unmatched data from memory" or " I'm choosing to drop unmatched data"?).
>> Thanks,
>> Joe Canner
>> Johns Hopkins University School of Medicine
>> -----Original Message-----
>> From: [email protected]
>> [mailto:[email protected]] On Behalf Of Katie Farrin
>> Sent: Monday, December 02, 2013 7:01 AM
>> To: [email protected]
>> Subject: st: additional observations when merging data sets
>> Good morning,
>> I have a general question I was hoping someone would be able to help with.
>> I have household data that I would like to merge with corresponding
>> community level data, and am using a pairwise merge by community.  I select
>> to keep unmatched data from memory, as my sample of households does not
>> include all communities surveyed.  However, when I merge the data I get 50
>> additional observations and I am not sure why this is happening (my original
>> data set has 2056 observations and the merged set has 2106).  If I'm
>> choosing to drop unmatched data from the new file, what is going on to
>> create additional observations?
>> Each household should have only one associated community ID.
>> Thank you!
>> Katie
>> *
>> *   For searches and help try:
>> *
>> *
>> *
>> *
>> *   For searches and help try:
>> *
>> *
>> *
*   For searches and help try:

© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index