[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

From |
"Nick Cox" <[email protected]> |

To |
<[email protected]> |

Subject |
RE: st: compute variables referring to other observations |

Date |
Mon, 6 Oct 2003 11:10:11 +0100 |

Fabian Bornhorst > > quite often I want to compute a variable for each > observation making reference to values from other > observations. I found a way of doing it but there must be a > faster way of programming this! For example, in a dataset > with the variables household (HH), member, father and education > > > > HH member father education > > 1 1 . 5 > > 1 2 . 6 > > 1 3 2 1 > > > > 2 1 3 2 > > 2 2 . 4 > > 2 3 . 5 > > 2 4 3 1 > > > > The variable father indicates that in HH 1 the > observation with member==2 is the father of member==3. > Similar, in HH 2 members 1 and 4 have member 3 as father. > Suppose I want to create a variable containing the > education of the father, ie > > > > HH member father education edu_father > > 1 1 . 5 . > > 1 2 . 6 . > > 1 3 2 1 6 > > > > 2 1 3 2 5 > > 2 2 . 4 . > > 2 3 . 5 . > > 2 4 3 1 5 > > > > What is the easiest way of doing so? In I did this with a > loop, which looks like this: > > > > gen edu_father=. > > gen mysample = father!=. > > gsort -mysample > > local end=r(N) > > forv i=1/`end'{ > > su father in `i'/`i', mean > > local father=r(mean) > > su HH in `i'/`i', mean > > local HH =r(mean) > > su education if HH == `HH' & member==`father', mean > > replace edu_father= r(mean) in `i'/`i' > > } > > > > This works, but it is very time consuming in big datasets > (on a P4 I estimate 4 hours for the problem I have), and > certainly not very elegant. Stephen Jenkins > Avoid looping across observations, whereever possible. Use Stata's > great facilities for handling observations in groups (which > often means > -by(sort)- combined with -egen-): see the Manual (and no doubt FAQs) > > Example follows > > . clist, noobs > > HH member father educat~n edu_fa~r > 1 1 . 5 . > 1 2 . 6 . > 1 3 2 1 6 > 2 1 3 2 5 > 2 2 . 4 . > 2 3 . 5 . > 2 4 3 1 5 > > . bys HH: egen dad = mean(father) > > . bys HH: egen daded = max( (dad==member)*education) > > . replace daded = . if father == . > (4 real changes made, 4 to missing) > > . clist, noobs > > HH member father educat~n edu_fa~r dad > daded > 1 1 . 5 . 2 > . > 1 2 . 6 . 2 > . > 1 3 2 1 6 2 > 6 > 2 1 3 2 5 3 > 5 > 2 2 . 4 . 3 > . > 2 3 . 5 . 3 > . > 2 4 3 1 5 3 > 5 > > "daded" is the same as the variable you were trying to create. > > Note the use of the logical condition in the second -egen- so that > values for the father are picked out, and then "spread" to > the other HH > members > > You need to check things like whether it is possible to > have more than > one "father" per household, and so on I support Stephen's advice strongly. In addition, the worked examples in FAQs on the Stata Corp website should be of some help here. How do I create variables summarizing for each individual properties of the other members of a group? http://www.stata.com/support/faqs/data/members.html How do I create a variable recording whether any members of a group (or all members of a group) possess some characteristic? http://www.stata.com/support/faqs/data/anyall.html Nick [email protected] * * For searches and help try: * http://www.stata.com/support/faqs/res/findit.html * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

**References**:**Re: st: compute variables referring to other observations***From:*"Stephen P. Jenkins" <[email protected]>

- Prev by Date:
**Re: st: compute variables referring to other observations** - Next by Date:
**st: Zero Inflated Models** - Previous by thread:
**Re: st: compute variables referring to other observations** - Next by thread:
**st: Zero Inflated Models** - Index(es):

© Copyright 1996–2024 StataCorp LLC | Terms of use | Privacy | Contact us | What's new | Site index |