Stata The Stata listserver
[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

st: RE: Merging problem

From   [email protected]
To   [email protected]
Subject   st: RE: Merging problem
Date   Tue, 23 Jul 2002 12:13:27 -0400

Masanja, suppose the master data set is called individual.dta and the using
set, family.dta.  The merge will work if you make sure that there is only
one observation per family_id in your using data (family.dta).  You may
delete duplicate observations on family_id and index as follows:

   sort family_id index
   by family_id index : keep if _n==1
   by family_id: assert _n==1
   save family

If Stata returns

   assertion is false
after the -by ...: assert ...- line, this means you have more than one value
of index per family_id and must decide which value to retain before
proceeding any further.  If assert doesn't return any output, you may issue

   u individual
   sort family_id
   merge family_id using family, nokeep

This will fetch your index variable and match it based solely on family_id.
(Option -nokeep- states that Stata is not to retain observations in the
using data for which no matching family_id is found in the master data, if

Or you can use -mmerge- and issue:

   u individual
   mmerge family_id using family, type(n:1)

where the option type(n:1) states that the merge variable, family_id, does
not form a key in the master data set but does form a key in the using set.
To form a key, the data must contain a single observation per group of merge

Patrick Joly
[email protected]
[email protected]

-----Original Message-----
From: [email protected] [mailto:[email protected]]
Sent: July 23, 2002 9:44 AM
To: [email protected]
Subject: st: Merging problem

Dear statalister

I have 2 files one containing information at household level and another 
one at individual level. I would like to have each individual to in a 
household have a value of an index which is at household level. When merged 
two files on perm_id, here is what I get. Some individual are missing the 
value of the index. When I do it on family_id, it gives me un-anticipated 
results. What is the trick here?

         family_id        perm_id                 index
     1. BUA000001  BUA000001002          .
     2. BUA000001  BUA000001003          .
     3. BUA000001  BUA000001004          .
     4. BUA000001  BUA000001005          .
     5. BUA000001  BUA000001006          .
     6. BUA000001  BUA000001007   6.163015
    10. BUA000004  BUA000004002          .
    11. BUA000004  BUA000004003          .
    12. BUA000004  BUA000004004   8.866956


*   For searches and help try:

© Copyright 1996–2024 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index