Stata The Stata listserver
[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

st: Re: Counts of different values in one variable by another variable


From   "Michael Blasnik" <michael.blasnik@verizon.net>
To   <statalist@hsphsun2.harvard.edu>
Subject   st: Re: Counts of different values in one variable by another variable
Date   Thu, 25 Mar 2004 08:57:51 -0500

*flag each person just once
bysort hhid persid: gen byte perperson=(_n==1)
* calculate number of persons per household
egen totpeople=sum(perperson), by(hhid)
* flag each household once, to avoid duplicates in list commands
bysort hhid: gen byte taghh=(_n==1)
l hhid if taghh==1 & totpeople==1
l hhid if taghh==1 & totpeople==2
.. etc..

Michael Blasnik
michael.blasnik@verizon.net

----- Original Message ----- 
From: "Donnell Butler" <djbutler@princeton.edu>
To: <statalist@hsphsun2.harvard.edu>
Sent: Thursday, March 25, 2004 6:10 AM
Subject: st: Counts of different values in one variable by another variable


> Good Day,
>
> I am trying to do something which I imagine must be easy to do in Stata,
but
> I can't find the solution in the manuals, help books, or FAQ online.
> Clearly, I am not thinking clearly, because this seems like a simple
> request. Perhaps, I just don't know how to phrase the question correctly
in
> my search for the answer. Nevertheless, I am hoping that someone can help
or
> direct me to an existing response that my answer my question with the
> Statalist archive number or month/year.
>
>
> Here is a simplified version of my dilemma:
>
> I have a data set with multiple id numbers. There are is always one
> primary id (hhid), but sometimes there are more than one subsidiary ids
> (persid). The persid is simply two digits more than the hhid. For example
> hhid= 12345 and persid=1234501 (or in the cases where there is more than
> one, persid=1234501, 1234502, 1234503, etc. The records are structured
> such that for every action on a given date, there is a record. For
> example:
>
> HHID    PERSID    ACTION    DATE
> 12345   1234501 EAT 1/1/2003
> 12345   1234501   DRINK     1/2/2003
> 12345   1234501   DRINK     1/3/2003
> 12345   1234501   BE MERRY  1/4/2003
> 12345   1234502   DRINK     1/1/2003  <-Note new person id, but same hhid
> 12345   1234502   EAT       1/3/2003
> 12345   1234503   BE MERRY 1/2/2003  <-Note new person id, but same
> hhid
> 12346   1234601   BE MERRY  1/1/2003  <-Note new hhid
>
> ... and so on.
>
> So, here is my dilemma, I am trying to find a command or commands that
> will do two things:
> (1) For the entire data set, across all households, how many times are
> there 1,2,3,...N numbers of unique PERSIDs within a household? That is,
> how many households have 1,2,3,... N persons.
> (2) Display the HHID for households that have X number of persons? That
> is, for households with X number of unique PERSIDS within a household,
> list the HHIDS.
>
> It seems so simple, but the count command can't count within variables.
> The egen command can't work with by commands. Clearly, there is an obvious
> answer but I can't seem to figure it out. Please help.
>
> Thanks,
> Donnell
>
> Donnell Butler
> Ph.D. Candidate
> Princeton University
> 125 Wallace Hall
> Princeton, NJ 08540
> 609-419-1311


*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index