# st: egen versus gen / generate with by

 From Daniel Egan <[email protected]> To [email protected] Subject st: egen versus gen / generate with by Date Wed, 6 Oct 2004 17:07:03 -0400

```This is a summary of a thread initiated under the heading :
"cumulative average moving through time ". I have changed the heading
in the hope that future generations may learn faster than I did.

The discussion moved to my ignorance of how -egen-, -generate- and
-by- work, with multiple voices explaining exactly why

by sort pid (ob):gen cave = sum(calc)/ob
is not the same as
egen cave=sum(calc)/ob, by(pid ob)

Thanks to Nick Cox, Michael Blasnick, Scott Merryman, and David Kantor
for their  explanations.
***************************************************************
Nick Cox (as usual) wrote the bible on it:
<quote>
Be very careful here. You're confusing
some quite different beasts.

-egen-
======

-egen, sum()- fires up an -egen- function which
produces totals. Under -by:- or with
a -by()- option it produces group totals.
You can find the code in -_gsum.ado- (-which
_gsum- will find where on your machine).

In essence, -egen- only takes -egen- functions,
either as documented under -[R] egen-, or as
user-defined -egen- functions _always_
flagged as such.

Also, -egen- functions are _never, ever_
allowed anywhere else. They require -egen-
absolutely.

-egen- is really rather limited. There are
perhaps of the order of 100 -egen- functions written,
and that's a fixed menu, except insofar as
if you don't like them, you can indeed

-sum()- and other functions
===========================

-sum()- anywhere else it is legal fires up
the -sum()- function which produces
cumulative sums. This is part of the
executable and has been so for a very
long time, perhaps even since Stata 1.0.

-generate- (and -replace-) can in effect
take very complicated expressions
as arguments, making use of constants,
variables, operators and functions
such as -sum()-. The scope of -generate-
is in no way indicated by the few token
examples in the help. By combining constants,
variables, operators and functions,
you have _much_ more flexibility than with
-egen-.

Why then bother with -egen-? Just
for convenience, that some often
repeated sets of operations have been
rolled into -egen- functions.

by:
===

How to move step by: step. Stata Journal 2(1): 86-102
(2002)

which gathers the main ideas in one place. The obvious
alternative is to look up -by- in the Manual index and read
the several sections thus indicated. The article
just mentioned was written because the coverage
of -by:- in the manuals is a bit fragmented.
<end quote>
*****************************************************************
On this note, Scott Merryman said:
<quote>
bysort pid (ob)- sorts pid and then ob within pid but it performs the
-gen cave = sum(calc)/ob-  only on pid.
-bysort pid ob- would not work because
it would perform the calculation on each pid and ob pair.

I don't believe the –by- option in -egen- is flexible enough to interpret
-egen cave=sum(calc)/ob, by(pid ob)- correctly.  Also, -egen ,sum()- does not
allow expressions as sum(calc)/ob.

You might find Nick Cox's article "Speaking Stata:  How to move step by: step"
<end quote>
**********************************************
Dave Kantor noted:
<quote>
See -help mathfun- for details.
<end quote>

Dan

*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
```