[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

From |
"Nick Cox" <n.j.cox@durham.ac.uk> |

To |
<statalist@hsphsun2.harvard.edu> |

Subject |
st: RE: Re: Counts of different values in one variable by another variable |

Date |
Thu, 25 Mar 2004 16:47:34 -0000 |

A few extra comments: -egen- can be used in conjunction with -by:- whenever it makes sense to do that. (If not, StataCorp would no doubt like to know of specific exceptions.) What's more, although it's not now documented, the same functionality is typically available through -by()- options, as Michael's code exemplifies. The flagging technique used by Michael here is also available (although under the label "tagging") through -egen, tag()-. The extra command -groups- from SSC could also be useful here. Showing these alternatives, Michael's code can be translated as below. (This is not better code, just different. The logic is exactly equivalent. This is the standard "first principles" or "canned functions" issue.) *flag each person just once egen ppererson = tag(hhid persid) * calculate number of persons per household egen totpeople = sum(perperson), by(hhid) * flag each household once, to avoid duplicates in list commands egen taghh = tag(hhid) groups totpeople hhid if taghh Nick n.j.cox@durham.ac.uk Michael Blasnik replied to Donnell Butler: > *flag each person just once > bysort hhid persid: gen byte perperson=(_n==1) > * calculate number of persons per household > egen totpeople=sum(perperson), by(hhid) > * flag each household once, to avoid duplicates in list commands > bysort hhid: gen byte taghh=(_n==1) > l hhid if taghh==1 & totpeople==1 > l hhid if taghh==1 & totpeople==2 > .. etc.. > > Here is a simplified version of my dilemma: > > > > I have a data set with multiple id numbers. There are is always one > > primary id (hhid), but sometimes there are more than one > subsidiary ids > > (persid). The persid is simply two digits more than the > hhid. For example > > hhid= 12345 and persid=1234501 (or in the cases where there > is more than > > one, persid=1234501, 1234502, 1234503, etc. The records are > structured > > such that for every action on a given date, there is a record. For > > example: > > > > HHID PERSID ACTION DATE > > 12345 1234501 EAT 1/1/2003 > > 12345 1234501 DRINK 1/2/2003 > > 12345 1234501 DRINK 1/3/2003 > > 12345 1234501 BE MERRY 1/4/2003 > > 12345 1234502 DRINK 1/1/2003 <-Note new person id, > but same hhid > > 12345 1234502 EAT 1/3/2003 > > 12345 1234503 BE MERRY 1/2/2003 <-Note new person id, but same > > hhid > > 12346 1234601 BE MERRY 1/1/2003 <-Note new hhid > > > > ... and so on. > > > > So, here is my dilemma, I am trying to find a command or > commands that > > will do two things: > > (1) For the entire data set, across all households, how > many times are > > there 1,2,3,...N numbers of unique PERSIDs within a > household? That is, > > how many households have 1,2,3,... N persons. > > (2) Display the HHID for households that have X number of > persons? That > > is, for households with X number of unique PERSIDS within a > household, > > list the HHIDS. > > > > It seems so simple, but the count command can't count > within variables. > > The egen command can't work with by commands. > * * For searches and help try: * http://www.stata.com/support/faqs/res/findit.html * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

**Follow-Ups**:**st: Why won't -newey2- run?***From:*"Clive Nicholas" <Clive.Nicholas@newcastle.ac.uk>

- Prev by Date:
**st: "year not regularly spaced" and newey** - Next by Date:
**st: Data manipulation** - Previous by thread:
**st: Simulation and ML** - Next by thread:
**st: Why won't -newey2- run?** - Index(es):

© Copyright 1996–2015 StataCorp LP | Terms of use | Privacy | Contact us | What's new | Site index |