Stata The Stata listserver
[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

st: Re: Counts of different values in one variable by another variable

From   "Michael Blasnik" <[email protected]>
To   <[email protected]>
Subject   st: Re: Counts of different values in one variable by another variable
Date   Thu, 25 Mar 2004 08:57:51 -0500

*flag each person just once
bysort hhid persid: gen byte perperson=(_n==1)
* calculate number of persons per household
egen totpeople=sum(perperson), by(hhid)
* flag each household once, to avoid duplicates in list commands
bysort hhid: gen byte taghh=(_n==1)
l hhid if taghh==1 & totpeople==1
l hhid if taghh==1 & totpeople==2
.. etc..

Michael Blasnik
[email protected]

----- Original Message ----- 
From: "Donnell Butler" <[email protected]>
To: <[email protected]>
Sent: Thursday, March 25, 2004 6:10 AM
Subject: st: Counts of different values in one variable by another variable

> Good Day,
> I am trying to do something which I imagine must be easy to do in Stata,
> I can't find the solution in the manuals, help books, or FAQ online.
> Clearly, I am not thinking clearly, because this seems like a simple
> request. Perhaps, I just don't know how to phrase the question correctly
> my search for the answer. Nevertheless, I am hoping that someone can help
> direct me to an existing response that my answer my question with the
> Statalist archive number or month/year.
> Here is a simplified version of my dilemma:
> I have a data set with multiple id numbers. There are is always one
> primary id (hhid), but sometimes there are more than one subsidiary ids
> (persid). The persid is simply two digits more than the hhid. For example
> hhid= 12345 and persid=1234501 (or in the cases where there is more than
> one, persid=1234501, 1234502, 1234503, etc. The records are structured
> such that for every action on a given date, there is a record. For
> example:
> 12345   1234501 EAT 1/1/2003
> 12345   1234501   DRINK     1/2/2003
> 12345   1234501   DRINK     1/3/2003
> 12345   1234501   BE MERRY  1/4/2003
> 12345   1234502   DRINK     1/1/2003  <-Note new person id, but same hhid
> 12345   1234502   EAT       1/3/2003
> 12345   1234503   BE MERRY 1/2/2003  <-Note new person id, but same
> hhid
> 12346   1234601   BE MERRY  1/1/2003  <-Note new hhid
> ... and so on.
> So, here is my dilemma, I am trying to find a command or commands that
> will do two things:
> (1) For the entire data set, across all households, how many times are
> there 1,2,3,...N numbers of unique PERSIDs within a household? That is,
> how many households have 1,2,3,... N persons.
> (2) Display the HHID for households that have X number of persons? That
> is, for households with X number of unique PERSIDS within a household,
> list the HHIDS.
> It seems so simple, but the count command can't count within variables.
> The egen command can't work with by commands. Clearly, there is an obvious
> answer but I can't seem to figure it out. Please help.
> Thanks,
> Donnell
> Donnell Butler
> Ph.D. Candidate
> Princeton University
> 125 Wallace Hall
> Princeton, NJ 08540
> 609-419-1311

*   For searches and help try:

© Copyright 1996–2024 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index