Stata: Data Analysis and Statistical Software

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: Counting observations within groups

From	Nick Cox <[email protected]>
To	[email protected]
Subject	Re: st: Counting observations within groups
Date	Fri, 30 Nov 2012 15:07:27 +0000

totprod > `m'

won't work unless the local macro `m' is defined. Two lines in
Austin's code not cited here showed how to do that

su totprod, mean
loc m=r(mean)

I can't test for your data, but

su totprod, mean
local m = r(mean)
egen big= total(totprod>`m' & (sic==12110|sic==11110)), by(fips)

is think equivalent.

Also,

su totprod, mean
egen big= total(totprod>`r(mean)' & (sic==12110|sic==11110)), by(fips)

is equivalent to that.

su totprod, mean
egen big= total(totprod>r(mean) & (sic==12110|sic==11110)), by(fips)

is living more dangerously as interpretation of r(mean) is postponed
until within -egen-.

The -egen- route is unlikely to be faster computatioally because
-egen- includes several lines of interpreted code;
all the important ones and none of the unimportant ones are in
Austin's code. However, it might be easier to work out in real time
that this is code that should work.

I attempted a survey of little methods in similar territory in

SJ-11-2 dm0055  . . . . . . . . . . . . . .  Speaking Stata: Compared with ...
        . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  N. J. Cox
        Q2/11   SJ 11(2):305--314                                (no commands)
        reviews techniques for relating values to values in other
        observations

The common ground is realised when you see that the argument of (in
this case) -egen, total()-  can be an _expression_ (which can be
(much) more complicated than a variable name).

On Fri, Nov 30, 2012 at 1:12 PM, Daniel Escher <[email protected]> wrote:
> Austin,
>
> Thank you so much! I had forgotten about using levelsof to create a
> local of all values in a variable. In this case, your third option was
> computationally quickest, but I'll keep the first two options in my
> head for later situations. For some reason, totprod>`m' needed to be
> changed to totprod>r(mean). Thus,
>
> su totprod, mean
> g big=(totprod>r(mean)&totprod<.)&(sic==12110|sic==11110)
> by fips: g sbig=sum(big)
> by fips: replace sbig=sbig[_N]
>
>
> On Thu, Nov 29, 2012 at 6:03 PM, Austin Nichols <[email protected]> wrote:
>> Daniel Escher <[email protected]>:
>>
>> I sent my prior post a bit prematurely... I meant to go on to say--
>> but one does not need a loop for this particular problem.
>>
>> Make a dummy, sum within county:
>>
>> su totprod, mean
>> g big=(totprod>`m'&totprod<.)&(sic==12110|sic==11110)
>> bys fips: g sbig=sum(big)
>> by fips: replace sbig=sbig[_N]
>>
>> On Thu, Nov 29, 2012 at 5:48 PM, Daniel Escher <[email protected]> wrote:
>>> Hello,
>>>
>>> I am trying to count the number of mines in a county by production.
>>> I.e., I'd like the number of mines in each county that are above the
>>> overall mean of production, and the number that are below. There are
>>> multiple mines per county, which is identified by its FIPS code.
>>> Missing data are marked by . The data are in long format.
>>>
>>> Here's what I have so far:
>>> . *bigmines = # of mines in a county above the overall mean
>>> . *totprod = total production per mine
>>> . *sic = type of mine
>>>
>>> . *ATTEMPT ONE
>>> . sort fips
>>> . su totprod // to get mean
>>> . by fips: egen bigmines = count(inrange(totprod, r(mean), .) &
>>> sic==12110 | sic==11110)  // This gives me total number of mines per
>>> FIPS code - not those that meet the criteria
>>> . drop bigmines
>>>
>>> . *ATTEMPT TWO
>>> . su totprod // to get mean
>>> . by fips: egen bigmines = total(mshahrs > r(mean) & sic==12110 |
>>> sic==11110) // This gives me the total number of mines per FIPS code
>>> if any mine exceeds the mean
>>> . drop bigmines
>>>
>>> . *ATTEMPT THREE
>>> . *Then I read Nick Cox's helpful article
>>> (http://www.stata-journal.com/sjpdf.html?articlenum=pr0029) which
>>> clued me in to -count-:
>>> . gen bigmines = 0
>>> . su totprod
>>> . count if inrange(totprod, r(mean), .) & sic==12110 | sic==11110
>>> . replace bigmines = r(N)
>>>
>>> The last attempt is what I want, and it "works." However, I don't know
>>> how to -count- and then store r(N) for each FIPS code. Using -by- does
>>> not seem to work. This probably requires a loop like...
>>>
>>> forvalues j = all values of fips {
>>>         count if inrange(mshahrs, r(mean), .) & sic==12110 | sic==11110
>>>         replace bigmines_hrs = r(N)
>>> }
>>>
>>> Is this close? Thank you so much for your help and time.
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/

References:
- st: Counting observations within groups
  - From: Daniel Escher <[email protected]>
- Re: st: Counting observations within groups
  - From: Austin Nichols <[email protected]>
- Re: st: Counting observations within groups
  - From: Daniel Escher <[email protected]>

Prev by Date: Re: st: Loading long string variables (from SQL) into Stata
Next by Date: st: Reshaping long to wide data from complex experimental design
Previous by thread: Re: st: Counting observations within groups
Next by thread: st: pseudo R square after svy logistic regression
Index(es):
- Date
- Thread