Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
From | Tim <lists@timbp.com> |
To | statalist@hsphsun2.harvard.edu |
Subject | Re: st: Programming Repetition for categories |
Date | Thu, 03 Oct 2013 13:06:30 +1000 |
I would probably make a variable for the categories, then use -by-. egen cat_SHARE_DEP = cut(SHARE_DEP), at(0, 10000000, 20000000, 500000, 100000000, 250000000, 1000000000, 10000000000), label
. foreach avg in Q BRANCH A TYPE P MEMB_TOT { . bys cat_SHARE_DEP: egen avg_`avg' = mean(`avg') . }The if you really want separate variables for the different means, you can separate them later, but it's probably not necessary. It will probably be easier to work with -by- and/or -if- to select the category means you want.
As for your code, the -if SHARE_DEP- command refers to the value of SHARE_DEP in the first observation, so only one of your if clauses will ever run, and if when it runs it will operate on the whole dataset as you have not used a subsetting -if- in the -egen- command.
See [U] 11.1.3 if exp and [P] if Tim BP On 3/10/2013 12:45, Andrew Hovel wrote:
I am trying to program repeated calculation of means for my a set of variables categorized in bins. I am using Stata 12 for windows. I am new to Stata programming, so I'm guessing there is a better way to do this than I am attempting, but here goes: I am calculating means of six variables (Q BRANCH A TYPE P MEMB_TOT) in my data across 7 different categories of another variable, SHARE_DEP (represents a value of total shares and deposits held by credit unions) The categories I use are 0-10million, 10-20million, 20-50 million, 50-100million, 100-250million, 250m-1billion, and >1billion The code I am using is: ***average <10m if SHARE_DEP < 10000000 { foreach average in Q BRANCH A TYPE P MEMB_TOT { egen avg010_`average' = mean(`average') } } ***average 10-20m if SHARE_DEP >= 20000000 & SHARE_DEP < 50000000 { foreach avg in Q BRANCH A TYPE P MEMB_TOT{ egen avg2050_`avg' = mean(`avg') } } *** and so forth through those >1billion. The problem here is that the means generated for the first step are equivalent to the whole population mean, not the mean for observations where SHARE_DEP < 10000000. (I checked this separately using -sum- for the variables after dropping all observations where SHARE_DEP > 10000000.) The subsequent if programs don't even execute. Any help or suggestions for resolving this would be great. -AH * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/faqs/resources/statalist-faq/ * http://www.ats.ucla.edu/stat/stata/
* * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/faqs/resources/statalist-faq/ * http://www.ats.ucla.edu/stat/stata/