Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: Programming Repetition for categories


From   Andrew Hovel <hovel.andrew@gmail.com>
To   statalist@hsphsun2.harvard.edu
Subject   Re: st: Programming Repetition for categories
Date   Wed, 2 Oct 2013 22:39:47 -0500

Thanks! That's very helpful. I should have realized the -if- was
evaluating only one case. Using -cut- and -by- together is a good
suggestion.

-Andrew



On Wed, Oct 2, 2013 at 10:06 PM, Tim <lists@timbp.com> wrote:
> I would probably make a variable for the categories, then use -by-
>
> . egen cat_SHARE_DEP = cut(SHARE_DEP), at(0, 10000000, 20000000, 500000,
> 100000000, 250000000, 1000000000, 10000000000), label
> . foreach avg in Q BRANCH A TYPE P MEMB_TOT {
> .    bys cat_SHARE_DEP: egen avg_`avg' = mean(`avg')
> . }
>
> The if you really want separate variables for the different means, you can
> separate them later, but it's probably not necessary. It will probably be
> easier to work with -by- and/or -if- to select the category means you want.
>
> As for your code, the -if SHARE_DEP- command refers to the value of
> SHARE_DEP in the first observation, so only one of your if clauses will ever
> run, and if when it runs it will operate on the whole dataset as you have
> not used a subsetting -if- in the -egen- command.
>
> See  [U] 11.1.3 if exp and  [P] if
>
> Tim BP
>
>
> On 3/10/2013 12:45, Andrew Hovel wrote:
>>
>> I am trying to program repeated calculation of means for my a set of
>> variables categorized in bins. I am using Stata 12 for windows.
>>
>> I am new to Stata programming, so I'm guessing there is a better way
>> to do this than I am attempting, but here goes:
>>
>> I am calculating means of six variables (Q BRANCH A TYPE P MEMB_TOT)
>> in my data across 7 different categories of another variable,
>> SHARE_DEP  (represents a value of  total shares and deposits held by
>> credit unions)
>>
>> The categories I use are 0-10million, 10-20million, 20-50 million,
>> 50-100million, 100-250million, 250m-1billion, and >1billion
>>
>> The code I am using is:
>>   ***average <10m
>> if SHARE_DEP < 10000000 {
>> foreach average in Q BRANCH A TYPE P MEMB_TOT {
>>   egen avg010_`average' = mean(`average')
>>   }
>> }
>> ***average 10-20m
>>   if SHARE_DEP >= 20000000 & SHARE_DEP < 50000000 {
>> foreach avg in Q BRANCH A TYPE P  MEMB_TOT{
>>   egen avg2050_`avg' = mean(`avg')
>>   }
>> }
>> ***
>> and so forth through those >1billion.
>>
>> The problem here is that the means generated for the first step are
>> equivalent to the whole population mean, not the mean for observations
>> where SHARE_DEP < 10000000. (I checked this separately using -sum- for
>> the variables after dropping all observations where SHARE_DEP >
>> 10000000.)
>> The subsequent if programs don't even execute.
>>
>> Any help or suggestions for resolving this would be great.
>>
>> -AH
>> *
>> *   For searches and help try:
>> *   http://www.stata.com/help.cgi?search
>> *   http://www.stata.com/support/faqs/resources/statalist-faq/
>> *   http://www.ats.ucla.edu/stat/stata/
>
>
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/faqs/resources/statalist-faq/
> *   http://www.ats.ucla.edu/stat/stata/
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index