[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

Re: st: compute variables referring to other observations

From	"Stephen P. Jenkins" <[email protected]>
To	[email protected]
Subject	Re: st: compute variables referring to other observations
Date	Mon, 6 Oct 2003 09:56:08 +0100 (GMT Daylight Time)

On Mon, 6 Oct 2003 08:32:43 +0200 "Bornhorst, Fabian" 
<[email protected]> wrote:

> Dear list,
>  
> quite often I want to compute a variable for each observation making reference to values from other observations. I found a way of doing it but there must be a faster way of programming this! For example, in a dataset with the variables household (HH), member, father and education
>  
> HH member  father  education
> 1    1      .       5
> 1    2      .       6
> 1    3      2       1
>  
> 2    1      3       2
> 2    2      .       4
> 2    3      .       5
> 2    4      3       1
>  
> The variable father indicates that in HH 1 the observation with member==2 is the father of member==3. Similar, in HH 2 members 1 and 4 have member 3 as father. Suppose I want to create a variable containing the education of the father, ie 
>  
> HH member  father  education edu_father
> 1    1      .        5          .
> 1    2      .        6          .
> 1    3      2        1          6
>  
> 2    1      3        2          5
> 2    2      .        4          .
> 2    3      .        5          .
> 2    4      3        1          5
>  
> What is the easiest way of doing so? In I did this with a loop, which looks like this:
>  
> gen edu_father=.
> gen mysample = father!=.
> gsort -mysample
> local end=r(N)
> forv i=1/`end'{
>     su father in `i'/`i', mean
>     local father=r(mean)
>     su HH in `i'/`i', mean
>     local HH =r(mean)
>     su education if HH == `HH' & member==`father', mean
>     replace edu_father= r(mean) in `i'/`i'
> }
>  
> This works, but it is very time consuming in big datasets (on a P4 I estimate 4 hours for the problem I have), and certainly not very elegant.
> 
> Does anyone know a shortcut for this?  Any suggestions are greatly appreciated, many thanks,
>  
> Fabian

Avoid looping across observations, whereever possible.  Use Stata's 
great facilities for handling observations in groups (which often means 
-by(sort)- combined with -egen-): see the Manual (and no doubt FAQs)

Example follows

. clist, noobs

      HH    member    father  educat~n  edu_fa~r
       1         1         .         5         .
       1         2         .         6         .
       1         3         2         1         6
       2         1         3         2         5
       2         2         .         4         .
       2         3         .         5         .
       2         4         3         1         5

. bys HH: egen dad = mean(father)

. bys HH: egen daded = max( (dad==member)*education)

. replace daded = . if father == .
(4 real changes made, 4 to missing)

. clist, noobs

      HH    member    father  educat~n  edu_fa~r        dad      daded
       1         1         .         5         .          2          .
       1         2         .         6         .          2          .
       1         3         2         1         6          2          6
       2         1         3         2         5          3          5
       2         2         .         4         .          3          .
       2         3         .         5         .          3          .
       2         4         3         1         5          3          5

"daded" is the same as the variable you were trying to create.

Note the use of the logical condition in the second -egen- so that 
values for the father are picked out, and then "spread" to the other HH 
members

You need to check things like whether it is possible to have more than 
one "father" per household, and so on


Stephen
----------------------
Professor Stephen P. Jenkins <[email protected]>
Institute for Social and Economic Research (ISER)
University of Essex, Colchester, CO4 3SQ, UK
Tel: +44 (0)1206 873374. Fax: +44 (0)1206 873151.
http://www.iser.essex.ac.uk

*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

Follow-Ups:
- RE: st: compute variables referring to other observations
  - From: "Nick Cox" <[email protected]>

References:
- st: compute variables referring to other observations
  - From: "Bornhorst, Fabian" <[email protected]>

Prev by Date: Re: st: bug in outreg: minus instead of parenthesis
Next by Date: RE: st: compute variables referring to other observations
Previous by thread: st: compute variables referring to other observations
Next by thread: RE: st: compute variables referring to other observations
Index(es):
- Date
- Thread