Fabian Bornhorst
> > quite often I want to compute a variable for each
> observation making reference to values from other
> observations. I found a way of doing it but there must be a
> faster way of programming this! For example, in a dataset
> with the variables household (HH), member, father and education
> >
> > HH member father education
> > 1 1 . 5
> > 1 2 . 6
> > 1 3 2 1
> >
> > 2 1 3 2
> > 2 2 . 4
> > 2 3 . 5
> > 2 4 3 1
> >
> > The variable father indicates that in HH 1 the
> observation with member==2 is the father of member==3.
> Similar, in HH 2 members 1 and 4 have member 3 as father.
> Suppose I want to create a variable containing the
> education of the father, ie
> >
> > HH member father education edu_father
> > 1 1 . 5 .
> > 1 2 . 6 .
> > 1 3 2 1 6
> >
> > 2 1 3 2 5
> > 2 2 . 4 .
> > 2 3 . 5 .
> > 2 4 3 1 5
> >
> > What is the easiest way of doing so? In I did this with a
> loop, which looks like this:
> >
> > gen edu_father=.
> > gen mysample = father!=.
> > gsort -mysample
> > local end=r(N)
> > forv i=1/`end'{
> > su father in `i'/`i', mean
> > local father=r(mean)
> > su HH in `i'/`i', mean
> > local HH =r(mean)
> > su education if HH == `HH' & member==`father', mean
> > replace edu_father= r(mean) in `i'/`i'
> > }
> >
> > This works, but it is very time consuming in big datasets
> (on a P4 I estimate 4 hours for the problem I have), and
> certainly not very elegant.
Stephen Jenkins
> Avoid looping across observations, whereever possible. Use Stata's
> great facilities for handling observations in groups (which
> often means
> -by(sort)- combined with -egen-): see the Manual (and no doubt FAQs)
>
> Example follows
>
> . clist, noobs
>
> HH member father educat~n edu_fa~r
> 1 1 . 5 .
> 1 2 . 6 .
> 1 3 2 1 6
> 2 1 3 2 5
> 2 2 . 4 .
> 2 3 . 5 .
> 2 4 3 1 5
>
> . bys HH: egen dad = mean(father)
>
> . bys HH: egen daded = max( (dad==member)*education)
>
> . replace daded = . if father == .
> (4 real changes made, 4 to missing)
>
> . clist, noobs
>
> HH member father educat~n edu_fa~r dad
> daded
> 1 1 . 5 . 2
> .
> 1 2 . 6 . 2
> .
> 1 3 2 1 6 2
> 6
> 2 1 3 2 5 3
> 5
> 2 2 . 4 . 3
> .
> 2 3 . 5 . 3
> .
> 2 4 3 1 5 3
> 5
>
> "daded" is the same as the variable you were trying to create.
>
> Note the use of the logical condition in the second -egen- so that
> values for the father are picked out, and then "spread" to
> the other HH
> members
>
> You need to check things like whether it is possible to
> have more than
> one "father" per household, and so on
I support Stephen's advice strongly. In addition,
the worked examples in FAQs on the Stata Corp website
should be of some help here.
How do I create variables summarizing for each individual
properties of the other members of a group?
http://www.stata.com/support/faqs/data/members.html
How do I create a variable recording whether any members
of a group (or all members of a group) possess some
characteristic?
http://www.stata.com/support/faqs/data/anyall.html
Nick
[email protected]
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/