mikhail bontch-osmolovski

statalist@hsphsun2.harvard.edu |

st: how to calculate a function (sum ) of observations without creatinga new variable ?

Sat, 16 Aug 2003 21:33:26 -0400

Dear users of statalist.

This is a long question about functions in Stata . I would like to know what is the best way to calculate a function of the data without creating a variable, i.e. without using gen or egen function.

Situation is simple: I have a big dataset of 8 mlns of observations and I need to calculate weighted sums of observations under certain conditions, I need: sum(wt) if b

Obvious way would be to:

1. egen c=sum(wt) if b

2. su c, or di c

3. drop c. However, this way is clumsy since I have to create additional 8 mln of observation of c, all equal to each other,

so it takes memory which is very limited in my case and extra time (it takes very long time) . As I understand egen sum works by first running gen sum and than replacing all observations with the last one. I could not find anything online, so I wrote a simple program which calculates sum of a variable called wt:

scalar a=0

local n=1

while `n'<=_N {

scalar a=a+wt[`n']

local n=`n'+1

}

display a

I could not believe my eyes, this simle program ran 3 times LONGER then egen a=sum(wt).

Later I used scalar n instead of local n to save interpretation time, but it stilled run 3 times longer than egen.

So I was forced to go back to egen, but it is estetically unpleasant to creat 8 mln observations when you need just a constant and often low memory does even allow me to have an extra variable. I am using Windows XP, Stata 8.1.

So I wonder if you know a good way to calculate a sum of variables and, in general, and function of variables under certain conditions, like you can do in Excel, without creating an extra variable?, why egen is faster than plain sum ? Is this the good case for having a plugin which is faster ? It is hard to believe that Stata has no commands for such a simple operation.

Misha Bonch-Osmolovski

UNC-Chapel Hill, Econ grad

ps. display which is also called hand calcualtor does not allow for if condition, so it does not work

