[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

From |
"Nick Cox" <n.j.cox@durham.ac.uk> |

To |
<statalist@hsphsun2.harvard.edu> |

Subject |
st: -egenmore- updated on SSC |

Date |
Thu, 11 Jul 2002 10:59:53 +0100 |

Thanks to Kit Baum, the -egenmore- package on SSC has been updated. This consists of (you've guessed it) more -egen- functions. Most require no more than Stata 6, but some require Stata 7, as is flagged in the package description and the collective help -egenmore-. (Other user-written -egen- functions can be located with -findit-.) To get a listing of function names, type . ssc desc egenmore To get more details, type . ssc type egenmore.hlp To install, use . ssc inst egenmore or . ssc inst egenmore, replace as appropriate. If your Stata is not up-to-date enough to include either -findit- or -ssc-, please see the first URL under my signature for advice. The update consists of a single new function -egroup()-. Its nonce name -egroup()- is intended merely to flag a small _e_xtension to the official Stata egen function -group()-. The extension is that the -label- option may specify a list of variables to use in the value labels of the new variable. The use of this is best shown by an example. Suppose as a small variation on examples with the auto data, we strip off the first word of -make- . egen manuf = head(make) and ask for a simple table showing frequencies: . tab manuf manuf | Freq. Percent Cum. ------------+----------------------------------- AMC | 3 4.05 4.05 Audi | 2 2.70 6.76 BMW | 1 1.35 8.11 Buick | 7 9.46 17.57 Cad. | 3 4.05 21.62 Chev. | 6 8.11 29.73 Datsun | 4 5.41 35.14 Dodge | 4 5.41 40.54 Fiat | 1 1.35 41.89 Ford | 2 2.70 44.59 Honda | 2 2.70 47.30 Linc. | 3 4.05 51.35 Mazda | 1 1.35 52.70 Merc. | 6 8.11 60.81 Olds | 7 9.46 70.27 Peugeot | 1 1.35 71.62 Plym. | 5 6.76 78.38 Pont. | 6 8.11 86.49 Renault | 1 1.35 87.84 Subaru | 1 1.35 89.19 Toyota | 3 4.05 93.24 VW | 4 5.41 98.65 Volvo | 1 1.35 100.00 ------------+----------------------------------- Total | 74 100.00 This shows a familiar feature: with string variables (and also with numeric variables with value labels -encode-d alphabetically), we get alphabetic (strictly, alphanumeric) order, which is great for look-up, but often lousy for identifying patterns or interesting features. A more useful table would be ordered on frequency, and highest first, or so I suggest. As it happens, there is a kludged solution to this particular problem with -tabulate-, a program called -tabsort-, but it is of more interest to identify a general approach to a solution, because the same irritation can arise with other tabular and graphical output. We can get most of the way there in two lines of official Stata. Calculate the frequencies ourselves, . bysort manuf : gen freq = -_N (remembering to negate values to get the desired sort order), and use -egen, group() label- to get an equivalent categorical variable. . egen Manuf = group(freq manuf) , label . tab Manuf group(freq | manuf) | Freq. Percent Cum. ------------+----------------------------------- -7 Buick | 7 9.46 9.46 -7 Olds | 7 9.46 18.92 -6 Chev. | 6 8.11 27.03 -6 Merc. | 6 8.11 35.14 -6 Pont. | 6 8.11 43.24 -5 Plym. | 5 6.76 50.00 -4 Datsun | 4 5.41 55.41 -4 Dodge | 4 5.41 60.81 -4 VW | 4 5.41 66.22 -3 AMC | 3 4.05 70.27 -3 Cad. | 3 4.05 74.32 -3 Linc. | 3 4.05 78.38 -3 Toyota | 3 4.05 82.43 -2 Audi | 2 2.70 85.14 -2 Ford | 2 2.70 87.84 -2 Honda | 2 2.70 90.54 -1 BMW | 1 1.35 91.89 -1 Fiat | 1 1.35 93.24 -1 Mazda | 1 1.35 94.59 -1 Peugeot | 1 1.35 95.95 -1 Renault | 1 1.35 97.30 -1 Subaru | 1 1.35 98.65 -1 Volvo | 1 1.35 100.00 ------------+----------------------------------- Total | 74 100.00 The nuisance remaining is that we have the negated frequencies cluttering up the value labels. (Ask for a value label, and -egen, group()- uses all the variables mentioned.) Hence the need for a new option, which is the only thing added in -egroup()-: . egen Manuf2 = egroup(freq manuf) , label(manuf) . tab Manuf2 group(manuf | ) | Freq. Percent Cum. ------------+----------------------------------- Buick | 7 9.46 9.46 Olds | 7 9.46 18.92 Chev. | 6 8.11 27.03 Merc. | 6 8.11 35.14 < it's OK > Peugeot | 1 1.35 95.95 Renault | 1 1.35 97.30 Subaru | 1 1.35 98.65 Volvo | 1 1.35 100.00 ------------+----------------------------------- Total | 74 100.00 This approach can be extended to other requests, standard or bizarre. Suppose we want a table ordered on maximum mpg: . bysort manuf : egen maxmpg = min(-mpg) (you can see that by hand-waving) . egen Manuf3 = egroup(maxmpg manuf) , label(manuf) . tabstat mpg , by(Manuf3) s(max) Summary for variables: mpg by categories of: Manuf3 (group(manuf)) Manuf3 | max --------+---------- VW | 41 Datsun | 35 Subaru | 35 Plym. | 34 <it's OK too > Fiat | 21 Volvo | 17 Linc. | 14 Peugeot | 14 --------+---------- Total | 41 ------------------- (Why we can't go . egen Manuf3 = egroup(maxmpg), label(manuf) Because we need to break ties on maxmpg.) A lot of detail explaining one little option, but it may be useful. Nick n.j.cox@durham.ac.uk * * For searches and help try: * http://www.stata.com/support/faqs/res/findit.html * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

**Follow-Ups**:**st: Happy summers!***From:*Marcello Pagano <pagano@hsph.harvard.edu>

- Prev by Date:
**Re: st: varlist** - Next by Date:
**Re: st: varlist** - Previous by thread:
**Re: st: varlist** - Next by thread:
**st: Happy summers!** - Index(es):

© Copyright 1996–2016 StataCorp LP | Terms of use | Privacy | Contact us | What's new | Site index |