Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

# Re: st: Counting observations within groups

 From Nick Cox To statalist@hsphsun2.harvard.edu Subject Re: st: Counting observations within groups Date Sat, 1 Dec 2012 14:17:29 +0000

```You are naturally correct about missings for your problem. Another technique is

mark touse
markout touse totprod
replace touse = 0 if !inlist(sic, 12110, 11110)

su totprod, mean
local m = r(mean)
egen big= total(touse * (totprod>`m' )), by(fips)

On Sat, Dec 1, 2012 at 1:57 PM, Daniel Escher <descher@nd.edu> wrote:
> Nick, thank you for your insights and for pointing out that it is
> safer to specifically store the mean as a local rather than rely on
> Stata's temporary memory of scalars. I tried your code below with the
> addition of a condition about missing data, and it worked well
> (roughly as fast as Austin's code):
>
> su totprod, mean
> loc m=r(mean)
> su totprod, mean
> local m = r(mean)
> egen big= total(totprod>`m' & totprod<. & (sic==12110|sic==11110)), by(fips)
>
> I had tried something similar (my Attempt 2) but without the necessary
> parentheses. Those make such a difference in this case.
>
> On Fri, Nov 30, 2012 at 10:07 AM, Nick Cox <njcoxstata@gmail.com> wrote:
>> totprod > `m'
>>
>> won't work unless the local macro `m' is defined. Two lines in
>> Austin's code not cited here showed how to do that
>>
>> su totprod, mean
>> loc m=r(mean)
>>
>> I can't test for your data, but
>>
>> su totprod, mean
>> local m = r(mean)
>> egen big= total(totprod>`m' & (sic==12110|sic==11110)), by(fips)
>>
>> is think equivalent.
>>
>> Also,
>>
>> su totprod, mean
>> egen big= total(totprod>`r(mean)' & (sic==12110|sic==11110)), by(fips)
>>
>> is equivalent to that.
>>
>> su totprod, mean
>> egen big= total(totprod>r(mean) & (sic==12110|sic==11110)), by(fips)
>>
>> is living more dangerously as interpretation of r(mean) is postponed
>> until within -egen-.
>>
>> The -egen- route is unlikely to be faster computatioally because
>> -egen- includes several lines of interpreted code;
>> all the important ones and none of the unimportant ones are in
>> Austin's code. However, it might be easier to work out in real time
>> that this is code that should work.
>>
>> I attempted a survey of little methods in similar territory in
>>
>> SJ-11-2 dm0055  . . . . . . . . . . . . . .  Speaking Stata: Compared with ...
>>         . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  N. J. Cox
>>         Q2/11   SJ 11(2):305--314                                (no commands)
>>         reviews techniques for relating values to values in other
>>         observations
>>
>> The common ground is realised when you see that the argument of (in
>> this case) -egen, total()-  can be an _expression_ (which can be
>> (much) more complicated than a variable name).
>>
>> On Fri, Nov 30, 2012 at 1:12 PM, Daniel Escher <descher@nd.edu> wrote:
>>> Austin,
>>>
>>> Thank you so much! I had forgotten about using levelsof to create a
>>> local of all values in a variable. In this case, your third option was
>>> computationally quickest, but I'll keep the first two options in my
>>> head for later situations. For some reason, totprod>`m' needed to be
>>> changed to totprod>r(mean). Thus,
>>>
>>> su totprod, mean
>>> g big=(totprod>r(mean)&totprod<.)&(sic==12110|sic==11110)
>>> by fips: g sbig=sum(big)
>>> by fips: replace sbig=sbig[_N]
>>>
>>>
>>> On Thu, Nov 29, 2012 at 6:03 PM, Austin Nichols <austinnichols@gmail.com> wrote:
>>>> Daniel Escher <descher@nd.edu>:
>>>>
>>>> I sent my prior post a bit prematurely... I meant to go on to say--
>>>> but one does not need a loop for this particular problem.
>>>>
>>>> Make a dummy, sum within county:
>>>>
>>>> su totprod, mean
>>>> g big=(totprod>`m'&totprod<.)&(sic==12110|sic==11110)
>>>> bys fips: g sbig=sum(big)
>>>> by fips: replace sbig=sbig[_N]
>>>>
>>>> On Thu, Nov 29, 2012 at 5:48 PM, Daniel Escher <descher@nd.edu> wrote:
>>>>> Hello,
>>>>>
>>>>> I am trying to count the number of mines in a county by production.
>>>>> I.e., I'd like the number of mines in each county that are above the
>>>>> overall mean of production, and the number that are below. There are
>>>>> multiple mines per county, which is identified by its FIPS code.
>>>>> Missing data are marked by . The data are in long format.
>>>>>
>>>>> Here's what I have so far:
>>>>> . *bigmines = # of mines in a county above the overall mean
>>>>> . *totprod = total production per mine
>>>>> . *sic = type of mine
>>>>>
>>>>> . *ATTEMPT ONE
>>>>> . sort fips
>>>>> . su totprod // to get mean
>>>>> . by fips: egen bigmines = count(inrange(totprod, r(mean), .) &
>>>>> sic==12110 | sic==11110)  // This gives me total number of mines per
>>>>> FIPS code - not those that meet the criteria
>>>>> . drop bigmines
>>>>>
>>>>> . *ATTEMPT TWO
>>>>> . su totprod // to get mean
>>>>> . by fips: egen bigmines = total(mshahrs > r(mean) & sic==12110 |
>>>>> sic==11110) // This gives me the total number of mines per FIPS code
>>>>> if any mine exceeds the mean
>>>>> . drop bigmines
>>>>>
>>>>> . *ATTEMPT THREE
>>>>> (http://www.stata-journal.com/sjpdf.html?articlenum=pr0029) which
>>>>> clued me in to -count-:
>>>>> . gen bigmines = 0
>>>>> . su totprod
>>>>> . count if inrange(totprod, r(mean), .) & sic==12110 | sic==11110
>>>>> . replace bigmines = r(N)
>>>>>
>>>>> The last attempt is what I want, and it "works." However, I don't know
>>>>> how to -count- and then store r(N) for each FIPS code. Using -by- does
>>>>> not seem to work. This probably requires a loop like...
>>>>>
>>>>> forvalues j = all values of fips {
>>>>>         count if inrange(mshahrs, r(mean), .) & sic==12110 | sic==11110
>>>>>         replace bigmines_hrs = r(N)
>>>>> }
>>>>>
>>>>> Is this close? Thank you so much for your help and time.
>> *
>> *   For searches and help try:
>> *   http://www.stata.com/help.cgi?search
>> *   http://www.stata.com/support/faqs/resources/statalist-faq/
>> *   http://www.ats.ucla.edu/stat/stata/
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/faqs/resources/statalist-faq/
> *   http://www.ats.ucla.edu/stat/stata/
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/
```