Stata The Stata listserver
[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

st: egen versus gen / generate with by

From   Daniel Egan <[email protected]>
To   [email protected]
Subject   st: egen versus gen / generate with by
Date   Wed, 6 Oct 2004 17:07:03 -0400

This is a summary of a thread initiated under the heading :
"cumulative average moving through time ". I have changed the heading
in the hope that future generations may learn faster than I did.

The discussion moved to my ignorance of how -egen-, -generate- and
-by- work, with multiple voices explaining exactly why

by sort pid (ob):gen cave = sum(calc)/ob
is not the same as
egen cave=sum(calc)/ob, by(pid ob)

Thanks to Nick Cox, Michael Blasnick, Scott Merryman, and David Kantor
for their  explanations.
Nick Cox (as usual) wrote the bible on it:
Be very careful here. You're confusing
some quite different beasts.


-egen, sum()- fires up an -egen- function which
produces totals. Under -by:- or with
a -by()- option it produces group totals.
You can find the code in -_gsum.ado- (-which
_gsum- will find where on your machine).

In essence, -egen- only takes -egen- functions,
either as documented under -[R] egen-, or as
user-defined -egen- functions _always_
flagged as such.

Also, -egen- functions are _never, ever_
allowed anywhere else. They require -egen-

-egen- is really rather limited. There are
perhaps of the order of 100 -egen- functions written,
and that's a fixed menu, except insofar as
if you don't like them, you can indeed
write your own.

-sum()- and other functions

-sum()- anywhere else it is legal fires up
the -sum()- function which produces
cumulative sums. This is part of the
executable and has been so for a very
long time, perhaps even since Stata 1.0.

-generate- (and -replace-) can in effect
take very complicated expressions
as arguments, making use of constants,
variables, operators and functions
such as -sum()-. The scope of -generate-
is in no way indicated by the few token
examples in the help. By combining constants,
variables, operators and functions,
you have _much_ more flexibility than with

Why then bother with -egen-? Just
for convenience, that some often
repeated sets of operations have been
rolled into -egen- functions.


How to move step by: step. Stata Journal 2(1): 86-102

which gathers the main ideas in one place. The obvious
alternative is to look up -by- in the Manual index and read
the several sections thus indicated. The article
just mentioned was written because the coverage
of -by:- in the manuals is a bit fragmented.
<end quote>
On this note, Scott Merryman said: 
bysort pid (ob)- sorts pid and then ob within pid but it performs the
-gen cave = sum(calc)/ob-  only on pid. 
 -bysort pid ob- would not work because
it would perform the calculation on each pid and ob pair.

I don't believe the –by- option in -egen- is flexible enough to interpret
-egen cave=sum(calc)/ob, by(pid ob)- correctly.  Also, -egen ,sum()- does not
allow expressions as sum(calc)/ob.

You might find Nick Cox's article "Speaking Stata:  How to move step by: step"
SJ 2(1) helpful.
<end quote>
Dave Kantor noted:
See -help mathfun- for details.
 <end quote>


*   For searches and help try:

© Copyright 1996–2024 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index