st: RE: Understanding the difference between gen and egen

 From "Nick Cox" To Subject st: RE: Understanding the difference between gen and egen Date Tue, 13 Jun 2006 18:41:00 +0100

```Dick is right to emphasise the scope for confusion here,
and StataCorp will take their share of the blame.

He is spot on about the difference between -generate-
and -egen-, and correct that the territory was surveyed in

SJ-2-4  pr0007  . . . . . Speaking Stata:  On getting functions to do the work
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .  N. J. Cox
Q4/02   SJ 2(4):411--427                                 (no commands)
tips for effectively using the Stata built-in functions
and egen functions

To restate some of the facts mentioned by Dick:

In the recent thread, apart from various typos
and illegal suggestions, the answers differed
because of the following situation:

0. The syntax of various -egen- functions
was changed in Stata 9. The old names
continue to work, but are not documented.

1. Some users are using Stata 9, and some a previous
version.

2. Very likely, some users of Stata 9 are remembering
previous syntax that continues to work.

Things were stirred up between 8 and 9 when Svend Juul gave a very
witty talk at the Berlin users' meeting a while
back. He pointed out how official Stata syntax
has produced various inconsistencies, whereby
in particular some things with the same name do different things.

There had been occasional minor grumbles, but matters
came to a head when Svend threw his (stink-)bomb into
the crater.

In this case, -sum()- is a Stata function that
has been around for a very long time, and
always means cumulative sum. So this can be
used as part of the argument to -generate-.

On the other hand, -sum()- is an -egen-
function that has been around for a while,
and means, well, sum, that is, it is
constant for whatever group it refers to.

Now consider the clash between principles:

(*) Consistency is good. Inconsistency is

(+) Continuity is good. Capricious-seeming changes
of syntax mostly confuse.

How would you resolve this? In this particular
example, the Stata function name -sum()- was regarded
as fixed, and the -egen- function -sum()- was
renamed -total()-. If we were throwing Stata away
and starting again, I would recommend -cusum()-
and -sum()- respectively, but that is not what
we are doing.

However, Dick's points about user-written routines
-- although interesting in themselves -- do not
apply in this context. The inconsistency
is entirely between different _official_ syntaxes.
Users cannot write a -sum()- function, and if they
wrote an -egen- function -sum()- it would take
precedence over the official one only if they
hacked their Stata set-up to ensure that that happened,
in which case they understand enough about Stata for
this not to be a problem for them. (Doing that would

Finally, this is nothing to do with -egenmore-
which is user-written and has neither -sum()- nor
-total()- function for the reason just given.

Nick
n.j.cox@durham.ac.uk

Dick Campbell

> I am now a confirmed Stata user and love it for all the reasons
> that Phil Schumm outlined the other day and more. But the recent
> note from a user regarding sums within id reminds me of how
> difficult it is sometimes for new users to get a handle on relatively
> simple things.  In the current case, if I look up "sum" in the master
> index, I am directed to -egen-. If I go to -egen-, somewhat
> surprisingly,
> I don't find -sum- listed in the list of egen functions
> beginning on page 101
> of the Data Management manual or summary of that list on page 105.
> Nor is it available in the drop down list of egen functions
> I can do -sum- as a function using -gen- but then I get a
> running sum, not
> a total.
>
> It struck me that perhaps -help egenmore- would show me the
> sum function.
> but no luck. A little further digging led me to the
> documentation for the
> -sum- function
> on page 157 which tells me that I can get a running sum and
> which directs
> me to an "alternative sum function" in -egen-. What I did
> eventually find in
> the documentation for -egen- was the -total- function which
> does indeed
> produce a sum. But so does the -sum- function, so one is
> apparently an
> alias for the other.
>
> This all may be an oversight or a failure on my part to see
> the obvious, but
> in general, I find the distinction between -gen- and -egen-
> to be confusing.
> It would seem logical that all of this stuff could be handled
> by -gen-. But
> -gen- is not accessible to users, being a built in command,
> while -egen- is and
> various users have added various things to it. Thus, I guess
> the reason for
> -egen- is that it is user accessible, not that it has some
> special status
> relative
> to -gen-. To a new user, however, and even to more experienced ones,
> this is all a bit confusing. Perhaps one of Nick's Stata
> Journal articles has
> clarified all of this, but new users won't have access to it.
>
> Why do I care about all of this? Because in my attempts to convince my
> colleagues that Stata has many virtues compared to other software that
> starts with "S" I am told that Stata is "hard to get into."
> That isn't true of course,
> but the fact that so many extremely useful aspects of Stata are user
> written and
> hence not in official documentation, does make things more
> complicated. I
> have no fix for this, and perhaps none is possible, and of
> course the ease with
> which user extensions can be written is a major strength.
> Still, I wonder if the
> next version's manuals might not contain a discussion of how
> user-written routines. Or perhaps there could be a more
> complete discussion of
> this issue could be posted on the "Stata Community" section
> of the Stata web site.

*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
```