On Mon, 6 Oct 2003 08:32:43 +0200 "Bornhorst, Fabian"
<[email protected]> wrote:
> Dear list,
>
> quite often I want to compute a variable for each observation making reference to values from other observations. I found a way of doing it but there must be a faster way of programming this! For example, in a dataset with the variables household (HH), member, father and education
>
> HH member father education
> 1 1 . 5
> 1 2 . 6
> 1 3 2 1
>
> 2 1 3 2
> 2 2 . 4
> 2 3 . 5
> 2 4 3 1
>
> The variable father indicates that in HH 1 the observation with member==2 is the father of member==3. Similar, in HH 2 members 1 and 4 have member 3 as father. Suppose I want to create a variable containing the education of the father, ie
>
> HH member father education edu_father
> 1 1 . 5 .
> 1 2 . 6 .
> 1 3 2 1 6
>
> 2 1 3 2 5
> 2 2 . 4 .
> 2 3 . 5 .
> 2 4 3 1 5
>
> What is the easiest way of doing so? In I did this with a loop, which looks like this:
>
> gen edu_father=.
> gen mysample = father!=.
> gsort -mysample
> local end=r(N)
> forv i=1/`end'{
> su father in `i'/`i', mean
> local father=r(mean)
> su HH in `i'/`i', mean
> local HH =r(mean)
> su education if HH == `HH' & member==`father', mean
> replace edu_father= r(mean) in `i'/`i'
> }
>
> This works, but it is very time consuming in big datasets (on a P4 I estimate 4 hours for the problem I have), and certainly not very elegant.
>
> Does anyone know a shortcut for this? Any suggestions are greatly appreciated, many thanks,
>
> Fabian
Avoid looping across observations, whereever possible. Use Stata's
great facilities for handling observations in groups (which often means
-by(sort)- combined with -egen-): see the Manual (and no doubt FAQs)
Example follows
. clist, noobs
HH member father educat~n edu_fa~r
1 1 . 5 .
1 2 . 6 .
1 3 2 1 6
2 1 3 2 5
2 2 . 4 .
2 3 . 5 .
2 4 3 1 5
. bys HH: egen dad = mean(father)
. bys HH: egen daded = max( (dad==member)*education)
. replace daded = . if father == .
(4 real changes made, 4 to missing)
. clist, noobs
HH member father educat~n edu_fa~r dad daded
1 1 . 5 . 2 .
1 2 . 6 . 2 .
1 3 2 1 6 2 6
2 1 3 2 5 3 5
2 2 . 4 . 3 .
2 3 . 5 . 3 .
2 4 3 1 5 3 5
"daded" is the same as the variable you were trying to create.
Note the use of the logical condition in the second -egen- so that
values for the father are picked out, and then "spread" to the other HH
members
You need to check things like whether it is possible to have more than
one "father" per household, and so on
Stephen
----------------------
Professor Stephen P. Jenkins <[email protected]>
Institute for Social and Economic Research (ISER)
University of Essex, Colchester, CO4 3SQ, UK
Tel: +44 (0)1206 873374. Fax: +44 (0)1206 873151.
http://www.iser.essex.ac.uk
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/