[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

From |
"Nick Cox" <n.j.cox@durham.ac.uk> |

To |
<statalist@hsphsun2.harvard.edu> |

Subject |
st: RE: egen & sum() |

Date |
Wed, 26 Nov 2008 10:08:26 -0000 |

-egen, sum()- is just the same as -egen, total()-. -egen, sum()- was cloned as -egen, total()- in Stata 9 for precisely the reason you identify. As Svend Juul in particular pointed out in various very entertaining talks in 2004, it was not a good idea to use -sum()- for cumulative or running sum in one context and the same name for unqualified sums in another. Although many experienced users had got used to this, it was (quite understandably) sometimes puzzling to newer users. But -egen, sum()- remains there so as to not to break old scripts or habits. It is just undocumented. Various other -egen- functions were renamed at the same time, but nothing should have broken anybody's code. -viewsource _gsum.ado- and -viewsource _gtotal.ado- lets you see the code. Another way to understand what's going on is to -set trace- and see what calls what. -egen- functions are totally transparent. It's vital to understand that functions and -egen- functions are completely separate beasts. -egen- functions are only understood by -egen- and the only functions -egen- understands are -egen- functions. That's two absolute rules. I've sometimes wondered whether -egen- functions should have been called something different, but it's rather late for that. Nick n.j.cox@durham.ac.uk Neil Shephard I've been poring over someone else's Stata code trying to understand it and have discovered what seems to be inconsistent or undocumented behaviour. The do-file has the following line... egen b = sum(a) This stood out to me as I thought the current version of -egen- uses the -total()- function to obtain the combined (as opposed to running) sum of a variable so checked the -man egen- and -man egenmore- pages and sure enough there is no mention of -sum() as an -egen- function. -sum()- is however a [P] function and returns the running sum of the specified variable. Thus I would have expected -egen b = sum(a)- to return the running sum of b, but this is not the case, it behaves as though -sum()- is a synonym for -total()- as the following example demonstrates... . clear . set obs 10 obs was 0, now 10 . gen a = _n . gen b = sum(a) . egen b2 = sum(a) . egen b3 = total(a) . list +-------------------+ | a b b2 b3 | |-------------------| 1. | 1 1 55 55 | 2. | 2 3 55 55 | 3. | 3 6 55 55 | 4. | 4 10 55 55 | 5. | 5 15 55 55 | |-------------------| 6. | 6 21 55 55 | 7. | 7 28 55 55 | 8. | 8 36 55 55 | 9. | 9 45 55 55 | 10. | 10 55 55 55 | +-------------------+ Based on the help-pages and documentation I would have expected b2 == b1 as -egen b2 = sum(a)- should be treating -sum()- as described in the -man sum()- page. Indeed even the third example in -man egen- shows that -sum()- should be used with -generate- and -total()- should be used with -egen- Is there any historical legacy that anyone is aware of -sum()- being a valid -egen- function that may be lingering around causing this behaviour? Should the behaviour or the documentation be modified? Or have I completely misunderstood things? * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

**References**:**st: egen & sum()***From:*Neil Shephard <nshephard@nhs.net>

- Prev by Date:
**st: RE: re: generate trend variable in Stata** - Next by Date:
**Re: st: RE: egen & sum()** - Previous by thread:
**RE: st: RE: egen & sum()** - Next by thread:
**re:st:re: egen and sum()** - Index(es):

© Copyright 1996–2014 StataCorp LP | Terms of use | Privacy | Contact us | What's new | Site index |