st: RE: Re: how to calculate a function (sum ) of observations without creating a new variable ?

Sun, 17 Aug 2003 16:41:46 +0100

mikhail bontch-osmolovski > > This is a long question about functions in Stata . > > I would like to know what is the best way to calculate a > function of > > the data without creating a variable, i.e. without using > gen or egen > > function. > > Situation is simple: I have a big dataset of 8 mlns of > observations and > > I need to calculate weighted sums of > > observations under certain conditions, I need: sum(wt) if b > > Obvious way would be to: > > 1. egen c=sum(wt) if b > > 2. su c, or di c > > 3. drop c. > > However, this way is clumsy since I have to create > additional 8 mln of > > observation of c, all equal to each other, > > so it takes memory which is very limited in my case and > extra time (it > > takes very long time) . > > As I understand egen sum works by first running gen sum and than > > replacing all observations with the last one. > > I could not find anything online, so I wrote a simple > program which > > calculates sum of a variable called wt: > > > > scalar a=0 > > local n=1 > > while `n'<=_N { > > scalar a=a+wt[`n'] > > local n=`n'+1 > > } > > display a > > > > I could not believe my eyes, this simle program ran 3 > times LONGER then > > egen a=sum(wt). > > Later I used scalar n instead of local n to save > interpretation time, > > but it stilled run 3 times longer than egen. > > So I was forced to go back to egen, but it is estetically > unpleasant to > > creat 8 mln observations when you need just a constant > and often low > > memory does even allow me to have an extra variable. I > am using Windows > > XP, Stata 8.1. > > > > So I wonder if you know a good way to calculate a sum of > variables and, > > in general, and function of variables under certain > conditions, like you > > can do in Excel, without creating an extra variable?, why > egen is faster > > than plain sum ? Is this the good case for having a > plugin which is > > faster ? It is hard to believe that Stata has no commands > for such a > > simple operation. > > > > ps. display which is also called hand calcualtor does not > allow for if > > condition, so it does not work Scott Merryman > -tabstat- is probably the easiest way to display a sum > > . use "C:\Stata8\auto.dta", clear > (1978 Automobile Data) > > . tabstat price if mpg>20, stat(sum) > > variable | sum > -------------+---------- > price | 192611 > ------------------------ > > Also, take a look at the saved results for -summarize- To expand on Scott's comments, although I can't comment on Excel: Your program is going to be very slow for the following reason, among others: You are obliging Stata to interpret several million lines of Stata code. Just to stress one point: Stata doesn't have a built-in compiler, so your program, although much shorter to type than -egen, sum()- would be is really much longer, because of the -while- loop. I am a big fan of -egen- where it is appropriate, but it has no advantages here over using -summarize-. su myvar ..., meanonly di r(sum) By the way, you say you want weighted sums, but none of your examples uses weights. I think you'll find that reasonably fast. Also, I doubt that you can improve on that very much with a plug-in, but it would be an interesting challenge. Nick n.j.cox@durham.ac.uk * * For searches and help try: * http://www.stata.com/support/faqs/res/findit.html * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

