Stata The Stata listserver
[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

st: RE: Re: Counts of different values in one variable by another variable

From   "Donnell Butler" <[email protected]>
To   <[email protected]>
Subject   st: RE: Re: Counts of different values in one variable by another variable
Date   Fri, 26 Mar 2004 08:29:06 -0500

Hello All,

I wanted to thank you for your suggestions. I received a number of
suggestions from folks, and I wanted to respond to everyone with the

After trying a number of suggestions, I found that a combination of Nick
Cox's and Michael Blasnik's suggestions were most elegant (based on coding
efficiency). In fact, one of Nick's commands helped me find a situation that
I was unaware of as a possibility with the data. After finding slightly
different results.

I isolated the cause and compared the flagging of each person just once

Nick on flagging each person just once: egen perpersonN = tag(hhid persid)
Michael on flagging each person just once: bys hhid persid: gen byte

After running a count on differences "count if perpersonN~=perpersonM", I
found using Nick's method that there were cases where there were households
with no person ids. That is, no interview occurred. Now, I would have found
this anyway since these households also had missing data for the person
responses. Nevertheless, using "egen perpersonN = tag(hhid persid)" captured
this information while accomplishing my other goals as well. So, if one is
unsure if every Y will have a corresponding X then "egen perpersonN =
tag(hhid persid)" seems to provide an extra information benefit.

Thanks to everyone.

And, particular thanks to Nick and Michael. I don't know what it took for
each of you to become so proficient with Stata, but if you teach an online
course or a workshop in an effort to share that expertise, please let me
know because I will be the first to sign up.

Be Well,

Donnell Butler
Ph.D. Candidate
Princeton University
125 Wallace Hall
Princeton, NJ 08540

FYI: Nick and Michael's emails:

From:  "Nick Cox" <n.j.cox@d...> (48797)
Date:  Thu Mar 25, 2004  11:47 am
Subject:  st: RE: Re: Counts of different values in one variable by another

A few extra comments:

-egen- can be used in conjunction with -by:- whenever
it makes sense to do that. (If not, StataCorp would
no doubt like to know of specific exceptions.) What's
more, although it's not now documented, the same functionality
is typically available through -by()- options, as Michael's
code exemplifies.

The flagging technique used by Michael here is
also available (although under the label "tagging")
through -egen, tag()-.

The extra command -groups- from SSC could also be
useful here.

Showing these alternatives, Michael's code
can be translated as below. (This is not better
code, just different. The logic is exactly
equivalent. This is the standard "first principles"
or "canned functions" issue.)

*flag each person just once
egen ppererson = tag(hhid persid)

* calculate number of persons per household
egen totpeople = sum(perperson), by(hhid)

* flag each household once, to avoid duplicates in list commands
egen taghh = tag(hhid)

groups totpeople hhid if taghh


© Copyright 1996–2024 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index