Stata The Stata listserver
[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

RE: st: compute variables referring to other observations


From   "Nick Cox" <[email protected]>
To   <[email protected]>
Subject   RE: st: compute variables referring to other observations
Date   Mon, 6 Oct 2003 11:10:11 +0100

Fabian Bornhorst

> > quite often I want to compute a variable for each
> observation making reference to values from other
> observations. I found a way of doing it but there must be a
> faster way of programming this! For example, in a dataset
> with the variables household (HH), member, father and education
> >
> > HH member  father  education
> > 1    1      .       5
> > 1    2      .       6
> > 1    3      2       1
> >
> > 2    1      3       2
> > 2    2      .       4
> > 2    3      .       5
> > 2    4      3       1
> >
> > The variable father indicates that in HH 1 the
> observation with member==2 is the father of member==3.
> Similar, in HH 2 members 1 and 4 have member 3 as father.
> Suppose I want to create a variable containing the
> education of the father, ie
> >
> > HH member  father  education edu_father
> > 1    1      .        5          .
> > 1    2      .        6          .
> > 1    3      2        1          6
> >
> > 2    1      3        2          5
> > 2    2      .        4          .
> > 2    3      .        5          .
> > 2    4      3        1          5
> >
> > What is the easiest way of doing so? In I did this with a
> loop, which looks like this:
> >
> > gen edu_father=.
> > gen mysample = father!=.
> > gsort -mysample
> > local end=r(N)
> > forv i=1/`end'{
> >     su father in `i'/`i', mean
> >     local father=r(mean)
> >     su HH in `i'/`i', mean
> >     local HH =r(mean)
> >     su education if HH == `HH' & member==`father', mean
> >     replace edu_father= r(mean) in `i'/`i'
> > }
> >
> > This works, but it is very time consuming in big datasets
> (on a P4 I estimate 4 hours for the problem I have), and
> certainly not very elegant.

Stephen Jenkins

> Avoid looping across observations, whereever possible.  Use Stata's
> great facilities for handling observations in groups (which
> often means
> -by(sort)- combined with -egen-): see the Manual (and no doubt FAQs)
>
> Example follows
>
> . clist, noobs
>
>       HH    member    father  educat~n  edu_fa~r
>        1         1         .         5         .
>        1         2         .         6         .
>        1         3         2         1         6
>        2         1         3         2         5
>        2         2         .         4         .
>        2         3         .         5         .
>        2         4         3         1         5
>
> . bys HH: egen dad = mean(father)
>
> . bys HH: egen daded = max( (dad==member)*education)
>
> . replace daded = . if father == .
> (4 real changes made, 4 to missing)
>
> . clist, noobs
>
>       HH    member    father  educat~n  edu_fa~r        dad
>      daded
>        1         1         .         5         .          2
>          .
>        1         2         .         6         .          2
>          .
>        1         3         2         1         6          2
>          6
>        2         1         3         2         5          3
>          5
>        2         2         .         4         .          3
>          .
>        2         3         .         5         .          3
>          .
>        2         4         3         1         5          3
>          5
>
> "daded" is the same as the variable you were trying to create.
>
> Note the use of the logical condition in the second -egen- so that
> values for the father are picked out, and then "spread" to
> the other HH
> members
>
> You need to check things like whether it is possible to
> have more than
> one "father" per household, and so on

I support Stephen's advice strongly. In addition,
the worked examples in FAQs on the Stata Corp website
should be of some help here.

How do I create variables summarizing for each individual
properties of the other members of a group?
http://www.stata.com/support/faqs/data/members.html

How do I create a variable recording whether any members
of a group (or all members of a group) possess some
characteristic?
http://www.stata.com/support/faqs/data/anyall.html

Nick
[email protected]

*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2024 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index