Notice: On March 31, it was **announced** that Statalist is moving from an email list to a **forum**. The old list will shut down on April 23, and its replacement, **statalist.org** is already up and running.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

From |
Nick Cox <njcoxstata@gmail.com> |

To |
statalist@hsphsun2.harvard.edu |

Subject |
Re: st: Counting observations within groups |

Date |
Sat, 1 Dec 2012 14:17:29 +0000 |

You are naturally correct about missings for your problem. Another technique is mark touse markout touse totprod replace touse = 0 if !inlist(sic, 12110, 11110) su totprod, mean local m = r(mean) egen big= total(touse * (totprod>`m' )), by(fips) On Sat, Dec 1, 2012 at 1:57 PM, Daniel Escher <descher@nd.edu> wrote: > Nick, thank you for your insights and for pointing out that it is > safer to specifically store the mean as a local rather than rely on > Stata's temporary memory of scalars. I tried your code below with the > addition of a condition about missing data, and it worked well > (roughly as fast as Austin's code): > > su totprod, mean > loc m=r(mean) > su totprod, mean > local m = r(mean) > egen big= total(totprod>`m' & totprod<. & (sic==12110|sic==11110)), by(fips) > > I had tried something similar (my Attempt 2) but without the necessary > parentheses. Those make such a difference in this case. > > On Fri, Nov 30, 2012 at 10:07 AM, Nick Cox <njcoxstata@gmail.com> wrote: >> totprod > `m' >> >> won't work unless the local macro `m' is defined. Two lines in >> Austin's code not cited here showed how to do that >> >> su totprod, mean >> loc m=r(mean) >> >> I can't test for your data, but >> >> su totprod, mean >> local m = r(mean) >> egen big= total(totprod>`m' & (sic==12110|sic==11110)), by(fips) >> >> is think equivalent. >> >> Also, >> >> su totprod, mean >> egen big= total(totprod>`r(mean)' & (sic==12110|sic==11110)), by(fips) >> >> is equivalent to that. >> >> su totprod, mean >> egen big= total(totprod>r(mean) & (sic==12110|sic==11110)), by(fips) >> >> is living more dangerously as interpretation of r(mean) is postponed >> until within -egen-. >> >> The -egen- route is unlikely to be faster computatioally because >> -egen- includes several lines of interpreted code; >> all the important ones and none of the unimportant ones are in >> Austin's code. However, it might be easier to work out in real time >> that this is code that should work. >> >> I attempted a survey of little methods in similar territory in >> >> SJ-11-2 dm0055 . . . . . . . . . . . . . . Speaking Stata: Compared with ... >> . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . N. J. Cox >> Q2/11 SJ 11(2):305--314 (no commands) >> reviews techniques for relating values to values in other >> observations >> >> The common ground is realised when you see that the argument of (in >> this case) -egen, total()- can be an _expression_ (which can be >> (much) more complicated than a variable name). >> >> On Fri, Nov 30, 2012 at 1:12 PM, Daniel Escher <descher@nd.edu> wrote: >>> Austin, >>> >>> Thank you so much! I had forgotten about using levelsof to create a >>> local of all values in a variable. In this case, your third option was >>> computationally quickest, but I'll keep the first two options in my >>> head for later situations. For some reason, totprod>`m' needed to be >>> changed to totprod>r(mean). Thus, >>> >>> su totprod, mean >>> g big=(totprod>r(mean)&totprod<.)&(sic==12110|sic==11110) >>> by fips: g sbig=sum(big) >>> by fips: replace sbig=sbig[_N] >>> >>> >>> On Thu, Nov 29, 2012 at 6:03 PM, Austin Nichols <austinnichols@gmail.com> wrote: >>>> Daniel Escher <descher@nd.edu>: >>>> >>>> I sent my prior post a bit prematurely... I meant to go on to say-- >>>> but one does not need a loop for this particular problem. >>>> >>>> Make a dummy, sum within county: >>>> >>>> su totprod, mean >>>> g big=(totprod>`m'&totprod<.)&(sic==12110|sic==11110) >>>> bys fips: g sbig=sum(big) >>>> by fips: replace sbig=sbig[_N] >>>> >>>> On Thu, Nov 29, 2012 at 5:48 PM, Daniel Escher <descher@nd.edu> wrote: >>>>> Hello, >>>>> >>>>> I am trying to count the number of mines in a county by production. >>>>> I.e., I'd like the number of mines in each county that are above the >>>>> overall mean of production, and the number that are below. There are >>>>> multiple mines per county, which is identified by its FIPS code. >>>>> Missing data are marked by . The data are in long format. >>>>> >>>>> Here's what I have so far: >>>>> . *bigmines = # of mines in a county above the overall mean >>>>> . *totprod = total production per mine >>>>> . *sic = type of mine >>>>> >>>>> . *ATTEMPT ONE >>>>> . sort fips >>>>> . su totprod // to get mean >>>>> . by fips: egen bigmines = count(inrange(totprod, r(mean), .) & >>>>> sic==12110 | sic==11110) // This gives me total number of mines per >>>>> FIPS code - not those that meet the criteria >>>>> . drop bigmines >>>>> >>>>> . *ATTEMPT TWO >>>>> . su totprod // to get mean >>>>> . by fips: egen bigmines = total(mshahrs > r(mean) & sic==12110 | >>>>> sic==11110) // This gives me the total number of mines per FIPS code >>>>> if any mine exceeds the mean >>>>> . drop bigmines >>>>> >>>>> . *ATTEMPT THREE >>>>> . *Then I read Nick Cox's helpful article >>>>> (http://www.stata-journal.com/sjpdf.html?articlenum=pr0029) which >>>>> clued me in to -count-: >>>>> . gen bigmines = 0 >>>>> . su totprod >>>>> . count if inrange(totprod, r(mean), .) & sic==12110 | sic==11110 >>>>> . replace bigmines = r(N) >>>>> >>>>> The last attempt is what I want, and it "works." However, I don't know >>>>> how to -count- and then store r(N) for each FIPS code. Using -by- does >>>>> not seem to work. This probably requires a loop like... >>>>> >>>>> forvalues j = all values of fips { >>>>> count if inrange(mshahrs, r(mean), .) & sic==12110 | sic==11110 >>>>> replace bigmines_hrs = r(N) >>>>> } >>>>> >>>>> Is this close? Thank you so much for your help and time. >> * >> * For searches and help try: >> * http://www.stata.com/help.cgi?search >> * http://www.stata.com/support/faqs/resources/statalist-faq/ >> * http://www.ats.ucla.edu/stat/stata/ > * > * For searches and help try: > * http://www.stata.com/help.cgi?search > * http://www.stata.com/support/faqs/resources/statalist-faq/ > * http://www.ats.ucla.edu/stat/stata/ * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/faqs/resources/statalist-faq/ * http://www.ats.ucla.edu/stat/stata/

**References**:**Re: st: Counting observations within groups***From:*Daniel Escher <descher@nd.edu>

- Prev by Date:
**Re: st: Counting observations within groups** - Next by Date:
**st: polychoric matrix not positive definite** - Previous by thread:
**Re: st: Counting observations within groups** - Next by thread:
**st: polychoric matrix not positive definite** - Index(es):