Bookmark and Share

Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down on April 23, and its replacement, statalist.org is already up and running.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

st: Combining multiple observations into one observation with multiple variables


From   Conor Hughes <cbhughes@uchicago.edu>
To   statalist@hsphsun2.harvard.edu
Subject   st: Combining multiple observations into one observation with multiple variables
Date   Wed, 30 Jun 2010 13:40:14 +0700

Hi All,
I have a couple of survey datasets that I need to merge, but they're
organized in an inconvenient way.  The first is organized by
household, and individuals within the household.  The second is only
organized by household.  I'd like to do a many-to-one merge on
household, so as to preserve the individual id's.  However, in the
second dataset, rather than adding household characteristics as
variables, it adds them as observations, e.g.:

Dataset 1                                                          Dataset 2
-------------------------------------
-----------------------------------------------------------
household id | individual id                        household id |
household characteristic id
-------------------------------------
------------------------------------------------------------
         1        |        1
1        |            1
         1        |        2
1        |            3
         1        |        3
1        |            7
         2        |        1
1        |            11
         2        |        2
2        |             1
         3        |        1
2        |             8
         3        |        2
3        |             2

3        |             7

3        |             13
I'd prefer, in the second dataset, to have one observation for each
household, including household characteristics as dummy variables.  As
it is, the only way to get them together is via many-to-many merge,
which is foolish and doesn't work well, giving an output like
-------------------------------------------------------------------------------
household id | individual id | household characteristic id
-------------------------------------------------------------------------------
         1        |        1         |            1
         1        |        2         |            3
         1        |        3         |            7
         1        |        3         |            11
         2        |        1         |             1
         2        |        2         |             8
         3        |        1         |             2
         3        |        2         |             7
         3        |        2         |            13
   Which messes up the the first dataset, since it creates repeat
observations of individuals.  Is there a graceful way of the changing
the multiple observations per household in the second dataset to one
observation per household with characteristics represented as dummy
variables?  Any help would be greatly appreciated.  And please let me
know if I've described the situation poorly and you'd like
clarification.

Cheers,
Conor

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   Site index