Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: Programming Repetition for categories

From   Tim <>
Subject   Re: st: Programming Repetition for categories
Date   Thu, 03 Oct 2013 13:06:30 +1000

I would probably make a variable for the categories, then use -by-

. egen cat_SHARE_DEP = cut(SHARE_DEP), at(0, 10000000, 20000000, 500000, 100000000, 250000000, 1000000000, 10000000000), label
. foreach avg in Q BRANCH A TYPE P MEMB_TOT {
.    bys cat_SHARE_DEP: egen avg_`avg' = mean(`avg')
. }

The if you really want separate variables for the different means, you can separate them later, but it's probably not necessary. It will probably be easier to work with -by- and/or -if- to select the category means you want.

As for your code, the -if SHARE_DEP- command refers to the value of SHARE_DEP in the first observation, so only one of your if clauses will ever run, and if when it runs it will operate on the whole dataset as you have not used a subsetting -if- in the -egen- command.

See  [U] 11.1.3 if exp and  [P] if

Tim BP

On 3/10/2013 12:45, Andrew Hovel wrote:
I am trying to program repeated calculation of means for my a set of
variables categorized in bins. I am using Stata 12 for windows.

I am new to Stata programming, so I'm guessing there is a better way
to do this than I am attempting, but here goes:

I am calculating means of six variables (Q BRANCH A TYPE P MEMB_TOT)
in my data across 7 different categories of another variable,
SHARE_DEP  (represents a value of  total shares and deposits held by
credit unions)

The categories I use are 0-10million, 10-20million, 20-50 million,
50-100million, 100-250million, 250m-1billion, and >1billion

The code I am using is:
  ***average <10m
if SHARE_DEP < 10000000 {
foreach average in Q BRANCH A TYPE P MEMB_TOT {
  egen avg010_`average' = mean(`average')
***average 10-20m
  if SHARE_DEP >= 20000000 & SHARE_DEP < 50000000 {
foreach avg in Q BRANCH A TYPE P  MEMB_TOT{
  egen avg2050_`avg' = mean(`avg')
and so forth through those >1billion.

The problem here is that the means generated for the first step are
equivalent to the whole population mean, not the mean for observations
where SHARE_DEP < 10000000. (I checked this separately using -sum- for
the variables after dropping all observations where SHARE_DEP >
The subsequent if programs don't even execute.

Any help or suggestions for resolving this would be great.

*   For searches and help try:

*   For searches and help try:

© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index