[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

From |
"Nick Cox" <n.j.cox@durham.ac.uk> |

To |
<statalist@hsphsun2.harvard.edu> |

Subject |
st: RE: RE: One last question about egen |

Date |
Mon, 15 Jul 2002 07:56:10 +0100 |

Rodrigo Briceņo asked > > Following with my previous doubts: I have a hospital discharges > database, and two of the variables from the list are: > > -diaest1- and -clave1-. > > i already processed the data to find the 10 most frequently > diagnoses with > the help of -egen, group()-. What do I need to do if I want the same > thing, but this time I want to separate the variable -diaest1-. Let's say > that I need the first 10 diagnoses for the discharges that have a > duration of 6 or more > days, and the first 10 diagnoses for the discharges that have a > duration fewer than 2 days. I already make a variable with establish those > durations (called -rank_estancia2-). > > rank_estancia2=1 (diaest <2 days) > rank_estancia2=2 (diaest 2-5 days) > rank_estancia2=1 (diaest 6 or more days) > > I tried to do something with -egen, group()- but my tries > didn't seem to be > useful. I already tried typing: > > tabsort clave1 if rank_estancia2==1 & group<11 > > (where group being the variable calculated for the first answer > of the day to this list and Nick Cox help me to build). > > Sorry for my ignorance. and I replied > I don't know how to do this cleanly with official > Stata's -egen, group()- as mentioned by Rodrigo. > > Once more I will show a way to do something like this > with my own -egroup()- function for -egen-, accessible > as part of the -egenmore- package on SSC. > > Without access to Rodrigo's data this is easier to > explain with an analogue for the auto data, which > naturally anybody interested can try them themselves. > > Suppose we have manufacturer name and a classification > of high or low mpg: > > . egen manuf = head(make) > . gen himpg = mpg > 21 > > Step 1. Calculate the frequencies you want displayed. > Remember to negate them if you want them shown > highest first. > > . bysort himpg manuf : gen freq = - _N > > Step 2. For each category of -himpg-, > get the groups in the order defined by -freq- and -manuf-, > and display the first 10 groups in each instance: > > . forval i = 0/1 { > . qui egen group`i' = egroup(freq manuf) if himpg == > `i' , l(manuf) > . tab group`i' if group`i' <= 10 > . } > > group(manuf | > ) | Freq. Percent Cum. > ------------+----------------------------------- > Buick | 6 16.67 16.67 > Olds | 6 16.67 33.33 > Merc. | 5 13.89 47.22 > Pont. | 5 13.89 61.11 > Cad. | 3 8.33 69.44 > Dodge | 3 8.33 77.78 > Linc. | 3 8.33 86.11 > Chev. | 2 5.56 91.67 > Toyota | 2 5.56 97.22 > AMC | 1 2.78 100.00 > ------------+----------------------------------- > Total | 36 100.00 > > group(manuf | > ) | Freq. Percent Cum. > ------------+----------------------------------- > Chev. | 4 17.39 17.39 > Plym. | 4 17.39 34.78 > VW | 4 17.39 52.17 > Datsun | 3 13.04 65.22 > AMC | 2 8.70 73.91 > Honda | 2 8.70 82.61 > Audi | 1 4.35 86.96 > BMW | 1 4.35 91.30 > Buick | 1 4.35 95.65 > Dodge | 1 4.35 100.00 > ------------+----------------------------------- > Total | 23 100.00 > > That could be improved a bit by putting in display > lines. > > Now one question might fairly be, and this was > what I thought of first, why not something more like > > . by himpg : egen group = egroup(freq manuf), l(manuf) > . by himpg : tab group if group <= 10 > > One answer is that -egroup()- does not support -by:-. > An even better answer is that changing the program > to support -by:- would run into an immediate problem > that it can't be combined with allocation of value > labels in the way that we want to allow output like > that above. > > I'm sure that there are other ways to approach the > problem. Here's another, assuming . egen manuf = head(make) . gen himpg = mpg > 21 It uses no user-written extras. Nothing in this assumes that the classifying variable has just 2 classes. 1. Create negated frequencies, to get proper sort order. . bysort himpg manuf : gen frequency = -_N 2. Calculate order explicitly: . bysort himpg freq manuf : gen order = _n == 1 . by himpg : replace order = sum(order) 3. Flip frequencies back again: . qui replace freq = - freq 4. Get your table: . by himpg : tabdisp order if order <= 10, cell(manuf freq) ______________________________________________________________________ _________ -> himpg = 0 ---------------------------------- order | manuf frequency ----------+----------------------- 1 | Buick 6 2 | Olds 6 3 | Merc. 5 4 | Pont. 5 5 | Cad. 3 6 | Dodge 3 7 | Linc. 3 8 | Chev. 2 9 | Toyota 2 10 | AMC 1 ---------------------------------- ______________________________________________________________________ _________ -> himpg = 1 ---------------------------------- order | manuf frequency ----------+----------------------- 1 | Chev. 4 2 | Plym. 4 3 | VW 4 4 | Datsun 3 5 | AMC 2 6 | Honda 2 7 | Audi 1 8 | BMW 1 9 | Buick 1 10 | Dodge 1 ---------------------------------- Nick n.j.cox@durham.ac.uk * * For searches and help try: * http://www.stata.com/support/faqs/res/findit.html * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

**References**:**st: RE: One last question about egen***From:*"Nick Cox" <n.j.cox@durham.ac.uk>

- Prev by Date:
**st: Elasticities** - Next by Date:
**st: Re: Elasticities** - Previous by thread:
**st: RE: One last question about egen** - Next by thread:
**st: Imputing Mean of Top-Coded Income Category** - Index(es):

© Copyright 1996–2015 StataCorp LP | Terms of use | Privacy | Contact us | What's new | Site index |