# st: RE: Re: how to calculate a function (sum ) of observations without creating a new variable ?

 From "Nick Cox" To Subject st: RE: Re: how to calculate a function (sum ) of observations without creating a new variable ? Date Sun, 17 Aug 2003 16:41:46 +0100

```mikhail bontch-osmolovski

> > This is a long question about functions in Stata .
> > I would like to know what is the best way to  calculate a
> function of
> > the data without creating a variable, i.e.  without using
> gen or egen
> > function.
> > Situation is simple: I have a big dataset of 8 mlns of
> observations and
> > I need to calculate weighted sums of
> > observations under certain conditions,  I need:  sum(wt) if b
> > Obvious way would be to:
> > 1.  egen c=sum(wt) if b
> > 2.  su c, or di c
> > 3. drop c.
> > However, this way is clumsy since I have to create
> > observation of c, all equal to each other,
> > so it takes memory which is very limited in my case and
> extra time (it
> > takes very long time) .
> > As I understand egen sum works by first running gen sum and than
> > replacing all observations with the last one.
> > I could not find anything online, so I wrote a simple
> program which
> > calculates sum of a variable called wt:
> >
> > scalar a=0
> > local n=1
> > while `n'<=_N {
> >     scalar a=a+wt[`n']
> >     local n=`n'+1
> >            }
> > display a
> >
> > I could not believe my eyes, this simle program ran 3
> times LONGER then
> > egen a=sum(wt).
> > Later I used scalar n instead of local n to save
> interpretation time,
> > but it stilled run 3 times longer than egen.
> > So I was forced to go back to egen, but it is estetically
> unpleasant to
> > creat 8 mln observations when you  need just a constant
> and  often low
> > memory does even allow me to have an extra variable. I
> am using Windows
> > XP, Stata 8.1.
> >
> > So I wonder if you know a good way to calculate a sum of
> variables and,
> > in general, and function of variables under certain
> conditions, like you
> > can do in Excel, without creating an extra variable?, why
> egen is faster
> > than plain sum ?  Is this the good case for having a
> plugin which is
> > faster ? It is hard to believe that Stata has no commands
> for such a
> > simple operation.
> >
> > ps. display which is also called hand calcualtor does not
> allow for if
> > condition, so it does not work

Scott Merryman

> -tabstat- is probably the easiest way to display a sum
>
> . use "C:\Stata8\auto.dta", clear
> (1978 Automobile Data)
>
> . tabstat price if mpg>20, stat(sum)
>
>     variable |       sum
> -------------+----------
>        price |    192611
> ------------------------
>
> Also, take a look at the saved results for -summarize-

To expand on Scott's comments, although
I can't comment on Excel:

Your program is going to be very slow
for the following reason, among others:
You are obliging Stata to interpret
several million lines of Stata code. Just
to stress one point: Stata doesn't have a built-in
compiler, so your program, although much shorter
to type than -egen, sum()- would be is really much
longer, because of the -while- loop.

I am a big fan of -egen- where it is appropriate,
but it has no advantages here over using
-summarize-.

su myvar ..., meanonly
di r(sum)

By the way, you say you want weighted sums,
but none of your examples uses weights.

I think you'll find that reasonably fast.
Also, I doubt that you can improve on that
very much with a plug-in, but it would be
an interesting challenge.

Nick
n.j.cox@durham.ac.uk

*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
```