Stata The Stata listserver
[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

st: RE: Counts of different values in one variable by another variable


From   "Ilya Beylin" <ilya.beylin@bateswhite.com>
To   <statalist@hsphsun2.harvard.edu>
Subject   st: RE: Counts of different values in one variable by another variable
Date   Thu, 25 Mar 2004 09:43:02 -0500

Donnel,

Perhaps your question has already been answered.  If not, these lines will do what you're looking for:

// after this command, dup_flag stores the number of other
// observations with the same HHID.  Where there is only
// one unique entry per household ID, dup_flag is set to 0.  Where 
// there are two (e.g. a married couple has been sampled) dup_flag = 1
// and so on.

duplicates tag HHID, gen(dup_flag)

// to see how many are in each "bin":
tab dup_flag

// if you want to list/display/browse by bin just type li/di/br if 
// dup_flag == X where X is the bin you wish to list/display/browse


I hope this helps,
Ilya

-----Original Message-----
From: owner-statalist@hsphsun2.harvard.edu
[mailto:owner-statalist@hsphsun2.harvard.edu]On Behalf Of Donnell Butler
Sent: Thursday, March 25, 2004 6:10 AM
To: statalist@hsphsun2.harvard.edu
Subject: Counts of different values in one variable by another variable


Good Day,

I am trying to do something which I imagine must be easy to do in Stata, but
I can't find the solution in the manuals, help books, or FAQ online.
Clearly, I am not thinking clearly, because this seems like a simple
request. Perhaps, I just don't know how to phrase the question correctly in
my search for the answer. Nevertheless, I am hoping that someone can help or
direct me to an existing response that my answer my question with the
Statalist archive number or month/year.


Here is a simplified version of my dilemma:

I have a data set with multiple id numbers. There are is always one
primary id (hhid), but sometimes there are more than one subsidiary ids
(persid). The persid is simply two digits more than the hhid. For example
hhid= 12345 and persid=1234501 (or in the cases where there is more than
one, persid=1234501, 1234502, 1234503, etc. The records are structured
such that for every action on a given date, there is a record. For
example:

HHID    PERSID    ACTION    DATE
12345   1234501 EAT 1/1/2003
12345   1234501   DRINK     1/2/2003
12345   1234501   DRINK     1/3/2003
12345   1234501   BE MERRY  1/4/2003
12345   1234502   DRINK     1/1/2003  <-Note new person id, but same hhid
12345   1234502   EAT       1/3/2003
12345   1234503   BE MERRY 1/2/2003  <-Note new person id, but same
hhid
12346   1234601   BE MERRY  1/1/2003  <-Note new hhid

... and so on.

So, here is my dilemma, I am trying to find a command or commands that
will do two things:
(1) For the entire data set, across all households, how many times are
there 1,2,3,...N numbers of unique PERSIDs within a household? That is,
how many households have 1,2,3,... N persons.
(2) Display the HHID for households that have X number of persons? That
is, for households with X number of unique PERSIDS within a household,
list the HHIDS.

It seems so simple, but the count command can't count within variables.
The egen command can't work with by commands. Clearly, there is an obvious
answer but I can't seem to figure it out. Please help.

Thanks,
Donnell

Donnell Butler
Ph.D. Candidate
Princeton University
125 Wallace Hall
Princeton, NJ 08540
609-419-1311

*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index