Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
From | Conor Hughes <cbhughes@uchicago.edu> |
To | statalist@hsphsun2.harvard.edu |
Subject | st: Re: Combining multiple observations into one observation with multiple variables |
Date | Wed, 30 Jun 2010 14:06:32 +0700 |
Sorry, my tables got smushed: Dataset1 ---------------------------------------- household id | individual id ---------------------------------------- 1 | 1 1 | 2 1 | 3 2 | 1 2 | 2 3 | 1 3 | 2 Dataset 2 ----------------------------------------------------------- household id | household characteristic id ------------------------------------------------------------ 1 | 1 1 | 3 1 | 7 1 | 11 2 | 1 2 | 8 3 | 2 3 | 7 3 | 13 On Wed, Jun 30, 2010 at 1:40 PM, Conor Hughes <cbhughes@uchicago.edu> wrote: > Hi All, > I have a couple of survey datasets that I need to merge, but they're > organized in an inconvenient way. The first is organized by > household, and individuals within the household. The second is only > organized by household. I'd like to do a many-to-one merge on > household, so as to preserve the individual id's. However, in the > second dataset, rather than adding household characteristics as > variables, it adds them as observations, e.g.: > > Dataset 1 Dataset 2 > ------------------------------------- > ----------------------------------------------------------- > household id | individual id household id | > household characteristic id > ------------------------------------- > ------------------------------------------------------------ > 1 | 1 > 1 | 1 > 1 | 2 > 1 | 3 > 1 | 3 > 1 | 7 > 2 | 1 > 1 | 11 > 2 | 2 > 2 | 1 > 3 | 1 > 2 | 8 > 3 | 2 > 3 | 2 > > 3 | 7 > > 3 | 13 > I'd prefer, in the second dataset, to have one observation for each > household, including household characteristics as dummy variables. As > it is, the only way to get them together is via many-to-many merge, > which is foolish and doesn't work well, giving an output like > ------------------------------------------------------------------------------- > household id | individual id | household characteristic id > ------------------------------------------------------------------------------- > 1 | 1 | 1 > 1 | 2 | 3 > 1 | 3 | 7 > 1 | 3 | 11 > 2 | 1 | 1 > 2 | 2 | 8 > 3 | 1 | 2 > 3 | 2 | 7 > 3 | 2 | 13 > Which messes up the the first dataset, since it creates repeat > observations of individuals. Is there a graceful way of the changing > the multiple observations per household in the second dataset to one > observation per household with characteristics represented as dummy > variables? Any help would be greatly appreciated. And please let me > know if I've described the situation poorly and you'd like > clarification. > > Cheers, > Conor > -- Conor Hughes Mathematics and Economics University of Chicago 2011 * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/