Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
From | Nick Cox <njcoxstata@gmail.com> |
To | statalist@hsphsun2.harvard.edu |
Subject | Re: st: Counting observations within groups |
Date | Fri, 30 Nov 2012 15:07:27 +0000 |
totprod > `m' won't work unless the local macro `m' is defined. Two lines in Austin's code not cited here showed how to do that su totprod, mean loc m=r(mean) I can't test for your data, but su totprod, mean local m = r(mean) egen big= total(totprod>`m' & (sic==12110|sic==11110)), by(fips) is think equivalent. Also, su totprod, mean egen big= total(totprod>`r(mean)' & (sic==12110|sic==11110)), by(fips) is equivalent to that. su totprod, mean egen big= total(totprod>r(mean) & (sic==12110|sic==11110)), by(fips) is living more dangerously as interpretation of r(mean) is postponed until within -egen-. The -egen- route is unlikely to be faster computatioally because -egen- includes several lines of interpreted code; all the important ones and none of the unimportant ones are in Austin's code. However, it might be easier to work out in real time that this is code that should work. I attempted a survey of little methods in similar territory in SJ-11-2 dm0055 . . . . . . . . . . . . . . Speaking Stata: Compared with ... . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . N. J. Cox Q2/11 SJ 11(2):305--314 (no commands) reviews techniques for relating values to values in other observations The common ground is realised when you see that the argument of (in this case) -egen, total()- can be an _expression_ (which can be (much) more complicated than a variable name). On Fri, Nov 30, 2012 at 1:12 PM, Daniel Escher <descher@nd.edu> wrote: > Austin, > > Thank you so much! I had forgotten about using levelsof to create a > local of all values in a variable. In this case, your third option was > computationally quickest, but I'll keep the first two options in my > head for later situations. For some reason, totprod>`m' needed to be > changed to totprod>r(mean). Thus, > > su totprod, mean > g big=(totprod>r(mean)&totprod<.)&(sic==12110|sic==11110) > by fips: g sbig=sum(big) > by fips: replace sbig=sbig[_N] > > > On Thu, Nov 29, 2012 at 6:03 PM, Austin Nichols <austinnichols@gmail.com> wrote: >> Daniel Escher <descher@nd.edu>: >> >> I sent my prior post a bit prematurely... I meant to go on to say-- >> but one does not need a loop for this particular problem. >> >> Make a dummy, sum within county: >> >> su totprod, mean >> g big=(totprod>`m'&totprod<.)&(sic==12110|sic==11110) >> bys fips: g sbig=sum(big) >> by fips: replace sbig=sbig[_N] >> >> On Thu, Nov 29, 2012 at 5:48 PM, Daniel Escher <descher@nd.edu> wrote: >>> Hello, >>> >>> I am trying to count the number of mines in a county by production. >>> I.e., I'd like the number of mines in each county that are above the >>> overall mean of production, and the number that are below. There are >>> multiple mines per county, which is identified by its FIPS code. >>> Missing data are marked by . The data are in long format. >>> >>> Here's what I have so far: >>> . *bigmines = # of mines in a county above the overall mean >>> . *totprod = total production per mine >>> . *sic = type of mine >>> >>> . *ATTEMPT ONE >>> . sort fips >>> . su totprod // to get mean >>> . by fips: egen bigmines = count(inrange(totprod, r(mean), .) & >>> sic==12110 | sic==11110) // This gives me total number of mines per >>> FIPS code - not those that meet the criteria >>> . drop bigmines >>> >>> . *ATTEMPT TWO >>> . su totprod // to get mean >>> . by fips: egen bigmines = total(mshahrs > r(mean) & sic==12110 | >>> sic==11110) // This gives me the total number of mines per FIPS code >>> if any mine exceeds the mean >>> . drop bigmines >>> >>> . *ATTEMPT THREE >>> . *Then I read Nick Cox's helpful article >>> (http://www.stata-journal.com/sjpdf.html?articlenum=pr0029) which >>> clued me in to -count-: >>> . gen bigmines = 0 >>> . su totprod >>> . count if inrange(totprod, r(mean), .) & sic==12110 | sic==11110 >>> . replace bigmines = r(N) >>> >>> The last attempt is what I want, and it "works." However, I don't know >>> how to -count- and then store r(N) for each FIPS code. Using -by- does >>> not seem to work. This probably requires a loop like... >>> >>> forvalues j = all values of fips { >>> count if inrange(mshahrs, r(mean), .) & sic==12110 | sic==11110 >>> replace bigmines_hrs = r(N) >>> } >>> >>> Is this close? Thank you so much for your help and time. * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/faqs/resources/statalist-faq/ * http://www.ats.ucla.edu/stat/stata/