Statalist The Stata Listserver


[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

st: Understanding the difference between gen and egen


From   Dick Campbell <dcamp@uic.edu>
To   "statalist-hsphsun2.harvard.edu" <statalist@hsphsun2.harvard.edu>
Subject   st: Understanding the difference between gen and egen
Date   Tue, 13 Jun 2006 11:41:01 -0500

I am now a confirmed Stata user and love it for all the reasons
that Phil Schumm outlined the other day and more. But the recent
note from a user regarding sums within id reminds me of how
difficult it is sometimes for new users to get a handle on relatively
simple things. In the current case, if I look up "sum" in the master
index, I am directed to -egen-. If I go to -egen-, somewhat surprisingly,
I don't find -sum- listed in the list of egen functions beginning on page 101
of the Data Management manual or summary of that list on page 105.
Nor is it available in the drop down list of egen functions
I can do -sum- as a function using -gen- but then I get a running sum, not
a total.

It struck me that perhaps -help egenmore- would show me the sum function.
but no luck. A little further digging led me to the documentation for the -sum- function
on page 157 which tells me that I can get a running sum and which directs
me to an "alternative sum function" in -egen-. What I did eventually find in
the documentation for -egen- was the -total- function which does indeed
produce a sum. But so does the -sum- function, so one is apparently an alias for the other.

This all may be an oversight or a failure on my part to see the obvious, but
in general, I find the distinction between -gen- and -egen- to be confusing.
It would seem logical that all of this stuff could be handled by -gen-. But
-gen- is not accessible to users, being a built in command, while -egen- is and
various users have added various things to it. Thus, I guess the reason for
-egen- is that it is user accessible, not that it has some special status relative
to -gen-. To a new user, however, and even to more experienced ones,
this is all a bit confusing. Perhaps one of Nick's Stata Journal articles has
clarified all of this, but new users won't have access to it.

Why do I care about all of this? Because in my attempts to convince my
colleagues that Stata has many virtues compared to other software that
starts with "S" I am told that Stata is "hard to get into." That isn't true of course,
but the fact that so many extremely useful aspects of Stata are user written and
hence not in official documentation, does make things more complicated. I
have no fix for this, and perhaps none is possible, and of course the ease with
which user extensions can be written is a major strength. Still, I wonder if the
next version's manuals might not contain a discussion of how to find out about
user-written routines. Or perhaps there could be a more complete discussion of
this issue could be posted on the "Stata Community" section of the Stata web site.


Richard T. Campbell
Division of Biostatistics and Epidemiology
School of Public Health
University of Illinois at Chicag



*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/




© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index