Bookmark and Share

Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down at the end of May, and its replacement, statalist.org is already up and running.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: Counting observations within groups


From   Austin Nichols <austinnichols@gmail.com>
To   statalist@hsphsun2.harvard.edu
Subject   Re: st: Counting observations within groups
Date   Thu, 29 Nov 2012 18:03:07 -0500

Daniel Escher <descher@nd.edu>:

I sent my prior post a bit prematurely... I meant to go on to say--
but one does not need a loop for this particular problem.

Make a dummy, sum within county:

su totprod, mean
g big=(totprod>`m'&totprod<.)&(sic==12110|sic==11110)
bys fips: g sbig=sum(big)
by fips: replace sbig=sbig[_N]

On Thu, Nov 29, 2012 at 5:48 PM, Daniel Escher <descher@nd.edu> wrote:
> Hello,
>
> I am trying to count the number of mines in a county by production.
> I.e., I'd like the number of mines in each county that are above the
> overall mean of production, and the number that are below. There are
> multiple mines per county, which is identified by its FIPS code.
> Missing data are marked by . The data are in long format.
>
> Here's what I have so far:
> . *bigmines = # of mines in a county above the overall mean
> . *totprod = total production per mine
> . *sic = type of mine
>
> . *ATTEMPT ONE
> . sort fips
> . su totprod // to get mean
> . by fips: egen bigmines = count(inrange(totprod, r(mean), .) &
> sic==12110 | sic==11110)  // This gives me total number of mines per
> FIPS code - not those that meet the criteria
> . drop bigmines
>
> . *ATTEMPT TWO
> . su totprod // to get mean
> . by fips: egen bigmines = total(mshahrs > r(mean) & sic==12110 |
> sic==11110) // This gives me the total number of mines per FIPS code
> if any mine exceeds the mean
> . drop bigmines
>
> . *ATTEMPT THREE
> . *Then I read Nick Cox's helpful article
> (http://www.stata-journal.com/sjpdf.html?articlenum=pr0029) which
> clued me in to -count-:
> . gen bigmines = 0
> . su totprod
> . count if inrange(totprod, r(mean), .) & sic==12110 | sic==11110
> . replace bigmines = r(N)
>
> The last attempt is what I want, and it "works." However, I don't know
> how to -count- and then store r(N) for each FIPS code. Using -by- does
> not seem to work. This probably requires a loop like...
>
> forvalues j = all values of fips {
>         count if inrange(mshahrs, r(mean), .) & sic==12110 | sic==11110
>         replace bigmines_hrs = r(N)
> }
>
> Is this close? Thank you so much for your help and time.
>
> Gratefully,
> Daniel
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   Site index