Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

st: Combining multiple observations into one observation with multiple variables


From   Conor Hughes <[email protected]>
To   [email protected]
Subject   st: Combining multiple observations into one observation with multiple variables
Date   Wed, 30 Jun 2010 13:40:14 +0700

Hi All,
I have a couple of survey datasets that I need to merge, but they're
organized in an inconvenient way.  The first is organized by
household, and individuals within the household.  The second is only
organized by household.  I'd like to do a many-to-one merge on
household, so as to preserve the individual id's.  However, in the
second dataset, rather than adding household characteristics as
variables, it adds them as observations, e.g.:

Dataset 1                                                          Dataset 2
-------------------------------------
-----------------------------------------------------------
household id | individual id                        household id |
household characteristic id
-------------------------------------
------------------------------------------------------------
         1        |        1
1        |            1
         1        |        2
1        |            3
         1        |        3
1        |            7
         2        |        1
1        |            11
         2        |        2
2        |             1
         3        |        1
2        |             8
         3        |        2
3        |             2

3        |             7

3        |             13
I'd prefer, in the second dataset, to have one observation for each
household, including household characteristics as dummy variables.  As
it is, the only way to get them together is via many-to-many merge,
which is foolish and doesn't work well, giving an output like
-------------------------------------------------------------------------------
household id | individual id | household characteristic id
-------------------------------------------------------------------------------
         1        |        1         |            1
         1        |        2         |            3
         1        |        3         |            7
         1        |        3         |            11
         2        |        1         |             1
         2        |        2         |             8
         3        |        1         |             2
         3        |        2         |             7
         3        |        2         |            13
   Which messes up the the first dataset, since it creates repeat
observations of individuals.  Is there a graceful way of the changing
the multiple observations per household in the second dataset to one
observation per household with characteristics represented as dummy
variables?  Any help would be greatly appreciated.  And please let me
know if I've described the situation poorly and you'd like
clarification.

Cheers,
Conor

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index