Stata The Stata listserver
[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

st: RE: Re: Counts of different values in one variable by another variable


From   "Nick Cox" <n.j.cox@durham.ac.uk>
To   <statalist@hsphsun2.harvard.edu>
Subject   st: RE: Re: Counts of different values in one variable by another variable
Date   Thu, 25 Mar 2004 16:47:34 -0000

A few extra comments: 

-egen- can be used in conjunction with -by:- whenever 
it makes sense to do that. (If not, StataCorp would 
no doubt like to know of specific exceptions.) What's 
more, although it's not now documented, the same functionality
is typically available through -by()- options, as Michael's
code exemplifies. 

The flagging technique used by Michael here is 
also available (although under the label "tagging") 
through -egen, tag()-. 

The extra command -groups- from SSC could also be
useful here. 

Showing these alternatives, Michael's code 
can be translated as below. (This is not better 
code, just different. The logic is exactly 
equivalent. This is the standard "first principles"
or "canned functions" issue.) 

*flag each person just once
egen ppererson = tag(hhid persid)

* calculate number of persons per household
egen totpeople = sum(perperson), by(hhid)

* flag each household once, to avoid duplicates in list commands
egen taghh = tag(hhid) 

groups totpeople hhid if taghh 

Nick 
n.j.cox@durham.ac.uk 

Michael Blasnik replied to Donnell Butler: 
 
> *flag each person just once
> bysort hhid persid: gen byte perperson=(_n==1)
> * calculate number of persons per household
> egen totpeople=sum(perperson), by(hhid)
> * flag each household once, to avoid duplicates in list commands
> bysort hhid: gen byte taghh=(_n==1)
> l hhid if taghh==1 & totpeople==1
> l hhid if taghh==1 & totpeople==2
> .. etc..

> > Here is a simplified version of my dilemma:
> >
> > I have a data set with multiple id numbers. There are is always one
> > primary id (hhid), but sometimes there are more than one 
> subsidiary ids
> > (persid). The persid is simply two digits more than the 
> hhid. For example
> > hhid= 12345 and persid=1234501 (or in the cases where there 
> is more than
> > one, persid=1234501, 1234502, 1234503, etc. The records are 
> structured
> > such that for every action on a given date, there is a record. For
> > example:
> >
> > HHID    PERSID    ACTION    DATE
> > 12345   1234501 EAT 1/1/2003
> > 12345   1234501   DRINK     1/2/2003
> > 12345   1234501   DRINK     1/3/2003
> > 12345   1234501   BE MERRY  1/4/2003
> > 12345   1234502   DRINK     1/1/2003  <-Note new person id, 
> but same hhid
> > 12345   1234502   EAT       1/3/2003
> > 12345   1234503   BE MERRY 1/2/2003  <-Note new person id, but same
> > hhid
> > 12346   1234601   BE MERRY  1/1/2003  <-Note new hhid
> >
> > ... and so on.
> >
> > So, here is my dilemma, I am trying to find a command or 
> commands that
> > will do two things:
> > (1) For the entire data set, across all households, how 
> many times are
> > there 1,2,3,...N numbers of unique PERSIDs within a 
> household? That is,
> > how many households have 1,2,3,... N persons.
> > (2) Display the HHID for households that have X number of 
> persons? That
> > is, for households with X number of unique PERSIDS within a 
> household,
> > list the HHIDS.
> >
> > It seems so simple, but the count command can't count 
> within variables.
> > The egen command can't work with by commands. 
> 

*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index