[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

From |
"Nick Cox" <n.j.cox@durham.ac.uk> |

To |
<statalist@hsphsun2.harvard.edu> |

Subject |
st: RE: Extensions to: Creating variables recording properties of the other members of a group |

Date |
Thu, 29 Aug 2002 14:28:33 +0100 |

Guillermo Cruces [ ... ] > In my example, I have a household survey where I don't have > direct information > about the number of kids of each individual, but I have > something like this: > hhid and member are just the household id and number of > member. Variables > fatherm and motherm tell you the number of the member of > the father and the > mother, if in the household: [ ... ] > I want to create the variable ownkids that gives me the > number of own kids > living in the house: [ ... ] I replied to Guillermo's posting with a proffered solution, but I didn't answer one of his questions. > My force brute solution, which makes a lot of unnecessary > comparisons and takes > very long (because I generate and drop many variables) is > of the form: with > maxmem being the number of members of each household (group > i, max is the number > of groups), > forvalues i = 1/`max' { > qui sum member if group==`i' > local maxmem=r(max) forvalues j = 1/`maxmem' { > di "-----------Household number `i', number of > members: `maxmem'" > forvalues k = 1/`maxmem' { > di "Household `i', member `j', comparing with `k'" > qui gen a=motherm==`j' if member==`k'&group==`i' > qui egen b=max(a) > qui replace mkids=mkids+b if member==`j'&group==`i' > drop a b > qui gen a=fatherm==`j' if member==`k'&group==`i' > qui egen b=max(a) > qui replace fkids=fkids+b if member==`j'&group==`i' > drop a b > } > } > } > > This creates two variables, mkids and fkids, which are the > number of kids for > mothers and fathers. For each member of the household, I > compare if . The egen, > replace, drop, takes very long, and even longer if the > dataset in memory is > large (I had to partition the dataset in 25 parts to make > this run faster). > The main problem (the main awkwardness in this program) is > that I gen, egen, > etc. because I could not just create a scalar that reflects > the value of a > variable for one precise observation, something of the form > (which of course > doesn't work): > local a=mother==`j' if member==`k'&group==`i' (meaning: > mother etc. should > refer to the observation: member==`k'&group==`i') > I coudn't use something like motherm[_...] becauseI was not > using by: ... . > What I would like to know if there are more efficient ways > of doing this (I'm > sure there are!). As indicated separately, this code is a triple loop which can be reduced to at most one loop. For the details, see my earlier posting. But the steps . egen b = max(a) ... . drop b could have been cut in a way that is of much wider interest and applicability. Guillermo wants just one number, the maximum. A good way to get it is, in general, . summarize a, meanonly followed by . scalar b = r(max) or . local b = r(max) or just by using r(max) or `r(max)' directly after the -summarize- . qui replace fkids=fkids + r(max) if member==`j'&group==`i' If you try this out for yourself, say with the auto data . su mpg, meanonly you will see nothing! The point, however, is what -summarize- leaves in its own wake. Type . ret li and you will see results which can be picked up for subsequent use. Note in particular that . su mpg, meanonly is faster than . su mpg because the second also calculates the sd and the variance. If you don't need either, you should use the speedier command. A separate point is that -egen- is an ado which calls another ado, and so there is an overhead for Stata which is obliged to interpret a few dozen command lines. Done once, that is less than a blink, but done repeatedly, it doesn't help any process which is already too slow. Some of these points were mentioned in the recently posted -stylerules- package on SSC: Use -summarize, meanonly- for speed when its returned results are sufficient. Avoid -egen- within programs: it is usually slower than a direct attack. Never use a variable to hold a constant: a macro or a scalar is all that is needed. Nick n.j.cox@durham.ac.uk P.S. On the last rule, I just found an exception. For a graphical purpose, I need a variable which is a constant. The variable defines a horizontal line, on which I show the information from another variable, something like this . gen bar = 0 . gra foo bar bazz, sy(o[anothervariable]) That's the trouble with style "rules": style is a subject on which there are exceptions to every rule you can think of, even this one. * * For searches and help try: * http://www.stata.com/support/faqs/res/findit.html * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

**References**:**st: Extensions to: Creating variables recording properties of the other membersof a group***From:*gcruces@worldbank.org

- Prev by Date:
**Re: st: selection in continuous time survival framework without instruments** - Next by Date:
**st: When is the t-test appropriate?** - Previous by thread:
**st: RE: Extensions to: Creating variables recording properties of the other members of a group** - Next by thread:
**st: RE: reg with xi and if exp** - Index(es):

© Copyright 1996–2014 StataCorp LP | Terms of use | Privacy | Contact us | What's new | Site index |