Bookmark and Share

Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down at the end of May, and its replacement, statalist.org is already up and running.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

st: Re: Combining multiple observations into one observation with multiple variables


From   Conor Hughes <cbhughes@uchicago.edu>
To   statalist@hsphsun2.harvard.edu
Subject   st: Re: Combining multiple observations into one observation with multiple variables
Date   Wed, 30 Jun 2010 14:06:32 +0700

Sorry, my tables got smushed:
Dataset1
----------------------------------------
household id | individual id
----------------------------------------
         1        |        1
         1        |        2
         1        |        3
         2        |        1
         2        |        2
         3        |        1
         3        |        2

Dataset 2
-----------------------------------------------------------
household id | household characteristic id
------------------------------------------------------------
         1        |                 1
         1        |                 3
         1        |                 7
         1        |                11
         2        |                 1
         2        |                 8
         3        |                 2
         3        |                 7
         3        |                13


On Wed, Jun 30, 2010 at 1:40 PM, Conor Hughes <cbhughes@uchicago.edu> wrote:
> Hi All,
> I have a couple of survey datasets that I need to merge, but they're
> organized in an inconvenient way.  The first is organized by
> household, and individuals within the household.  The second is only
> organized by household.  I'd like to do a many-to-one merge on
> household, so as to preserve the individual id's.  However, in the
> second dataset, rather than adding household characteristics as
> variables, it adds them as observations, e.g.:
>
> Dataset 1                                                          Dataset 2
> -------------------------------------
> -----------------------------------------------------------
> household id | individual id                        household id |
> household characteristic id
> -------------------------------------
> ------------------------------------------------------------
>          1        |        1
> 1        |            1
>          1        |        2
> 1        |            3
>          1        |        3
> 1        |            7
>          2        |        1
> 1        |            11
>          2        |        2
> 2        |             1
>          3        |        1
> 2        |             8
>          3        |        2
> 3        |             2
>
> 3        |             7
>
> 3        |             13
> I'd prefer, in the second dataset, to have one observation for each
> household, including household characteristics as dummy variables.  As
> it is, the only way to get them together is via many-to-many merge,
> which is foolish and doesn't work well, giving an output like
> -------------------------------------------------------------------------------
> household id | individual id | household characteristic id
> -------------------------------------------------------------------------------
>          1        |        1         |            1
>          1        |        2         |            3
>          1        |        3         |            7
>          1        |        3         |            11
>          2        |        1         |             1
>          2        |        2         |             8
>          3        |        1         |             2
>          3        |        2         |             7
>          3        |        2         |            13
>    Which messes up the the first dataset, since it creates repeat
> observations of individuals.  Is there a graceful way of the changing
> the multiple observations per household in the second dataset to one
> observation per household with characteristics represented as dummy
> variables?  Any help would be greatly appreciated.  And please let me
> know if I've described the situation poorly and you'd like
> clarification.
>
> Cheers,
> Conor
>



-- 
Conor Hughes
Mathematics and Economics
University of Chicago 2011

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   Site index