Statalist


[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

Re: st: collapse?


From   "Neil Shephard" <[email protected]>
To   [email protected]
Subject   Re: st: collapse?
Date   Sun, 29 Jul 2007 09:12:55 +0100

On 7/29/07, dnurhan <[email protected]> wrote:
>    Dear All:
>
>  i am working w/ a data set which looks like:
>
>
>   HouseHold ID         Member No         Income          Age        EdLevel
>    1001                             1                       $20,000
> 40             1
>    1001                             2                          40,000
> 32            3
>    1001                             3                             .
> 5            .
>    1001                             4                             .
> 3           .
>    1001                             5                           12,000
> 68           4
>    1002                             1                          45,000
> 45            5
>    1002                             2                          50,000
> 55            5
>    1003                             1                          34,000
> 39            2
>    ..........................................................................................................
>
>  i would like to create one record per household where each household's Age
> and EdLevel is the one of its highest
>  income earner. Namely:
>
> HouseHold ID          Age        EdLevel
>    1001                         32               3
>    1002                         55               5                  etc..
>
>  i do not know how to trick the "collapse by MemberNo" command to do the
> job. Any help would be greatly
>  appreciated. Thanx in advance.    nurhan

The online help for -collapse- will show you the way forward.  If your
not familiar with the syntax then I'd recommend reading the relevant
section of the User manual (which I don't have to hand but it might be
chapter 14, but don't quote me on that).

Anyway, how to get the dataset you want...

<------Start------->
/* Sort the data based on household and earnings (Note that you have */
/* a space in "HouseHold ID" which is illegal for variable names in        */
/* Stata so I'm assuming that this is all one word)
                  */
tempfile _t
sort HouseHoldID Income
save `_t'

/* Collapse the data by HouseHoldID to get the maximum earnings       */
collapse (max) Income, by(HouseHoldID)

/* Sort the data for merging so that you can get the other variables you  */
/* for the maximum Income
                              */
sort HouseHoldID Income

/* Merge with the complete data set and retain only those that match     */
/* the maximum Income
                               */
merge HouseHoldID Income using `_t'
tab _merge
drop if(_merge != 3)

/* List the variables of interest
                                 */
list HouseHoldID Age EDLevel
<------End------->

IMPORTANT - The above doesn't handle ties in Income, if there are two
people both earning the same maximum amount within a HouseHoldID then
they will both be retained.

Neil
-- 
"In mathematics you don't understand things. You just get used to
them."  - Johann von Neumann

Email - [email protected] / [email protected]
Website - http://slack.ser.man.ac.uk/
Photos - http://www.flickr.com/photos/slackline/
*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2024 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index