Stata The Stata listserver
[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

RE: st: cumulative average moving through time

From   "Nick Cox" <[email protected]>
To   <[email protected]>
Subject   RE: st: cumulative average moving through time
Date   Wed, 6 Oct 2004 21:28:50 +0100

Be very careful here. You're confusing 
some quite different beasts. 


-egen, sum()- fires up an -egen- function which 
produces totals. Under -by:- or with 
a -by()- option it produces group totals. 
You can find the code in -_gsum.ado- (-which 
_gsum- will find where on your machine). 

In essence, -egen- only takes -egen- functions, 
either as documented under -[R] egen-, or as 
user-defined -egen- functions _always_ 
flagged as such. 

Also, -egen- functions are _never, ever_ 
allowed anywhere else. They require -egen- 

-egen- is really rather limited. There are 
perhaps of the order of 100 -egen- functions written, 
and that's a fixed menu, except insofar as 
if you don't like them, you can indeed 
write your own. 

-sum()- and other functions

-sum()- anywhere else it is legal fires up 
the -sum()- function which produces 
cumulative sums. This is part of the 
executable and has been so for a very 
long time, perhaps even since Stata 1.0. 

-generate- (and -replace-) can in effect
take very complicated expressions 
as arguments, making use of constants, 
variables, operators and functions 
such as -sum()-. The scope of -generate- 
is in no way indicated by the few token 
examples in the help. By combining constants, 
variables, operators and functions, 
you have _much_ more flexibility than with 

Why then bother with -egen-? Just 
for convenience, that some often 
repeated sets of operations have been 
rolled into -egen- functions. 

Name conflict!

If you find this confusing, or difficult 
to defend, you are in 
excellent company. Svend Juul gave 
a very droll paper at the Berlin users' 
meeting in which he underlined this 
and a few other messes over names. 

StataCorp are known to be taking the 
issue seriously. At the same time, 
the last thing they want to do is 
to break any existing programs, 
do files or habits. 


One source of explanations is 

How to move step by: step. Stata Journal 2(1): 86-102

which gathers the main ideas in one place. The obvious
alternative is to look up -by- in the Manual index and read 
the several sections thus indicated. The article 
just mentioned was written because the coverage
of -by:- in the manuals is a bit fragmented. 

It's been said by a long-time Stata user
that wrapping your head around the possibilities
of -by:- is the biggest single step you can take 
to real Stata fluency. 

[email protected] 

Daniel Egan
> by sort pid (ob):gen cave = sum(calc)/ob
> This is so obvious as to be painful. So why didn't I think of it? 
> 1) Where/When did -sum()- become an acceptable argument to
> -generate-!?!? I have only ever seen it in the context of -egen-.
> Looking at the help for -generate-, there are no arguments that are
> explicitly stated to be useable. It is only at the very bottom of the
> examples that one sees an function -uniform-  and then -sum- used with
> gen. Are the others?  I know that using many egen arguments with -gen-
> will return errors (e.g. count).
> 2) Why does the ---bys pid (ob)-- do this correctly? I understand that
> it is equivalet to --sort PID OB--, but why does it result in the
> correct cumulative sum?
> Another way of putting this is why doesnt -egen cave=sum(calc)/ob,
> by(PID OB)- work if this does?

*   For searches and help try:

© Copyright 1996–2024 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index