[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]
st: RE: RE: One last question about egen

From	"Nick Cox" <[email protected]>
To	<[email protected]>
Subject	st: RE: RE: One last question about egen
Date	Mon, 15 Jul 2002 07:56:10 +0100
Rodrigo Brice�o asked
>
> Following with my previous doubts:  I have a hospital discharges
> database, and two of the variables from the list are:
>
> -diaest1- and -clave1-.
>
> i already processed the data to find the 10 most frequently
> diagnoses with
> the help of -egen, group()-. What do I need to do if I want the same
> thing, but this time I want to separate the variable -diaest1-.
Let's say
> that I need the first 10 diagnoses for the discharges that have a
> duration of 6 or more
> days, and the first 10 diagnoses for the discharges that have a
> duration fewer than 2 days. I already make a variable with establish
those
> durations (called -rank_estancia2-).
>
> rank_estancia2=1 (diaest <2 days)
> rank_estancia2=2 (diaest 2-5 days)
> rank_estancia2=1 (diaest 6 or more days)
>
> I tried to do something with -egen, group()- but my tries
> didn't seem to be
> useful. I already tried typing:
>
> tabsort clave1 if rank_estancia2==1 & group<11
>
> (where group being the variable calculated for the first answer
> of the day to this list and Nick Cox help me to build).
>
> Sorry for my ignorance.

and I replied

> I don't know how to do this cleanly with official
> Stata's -egen, group()- as mentioned by Rodrigo.
>
> Once more I will show a way to do something like this
> with my own -egroup()- function for -egen-, accessible
> as part of the -egenmore- package on SSC.
>
> Without access to Rodrigo's data this is easier to
> explain with an analogue for the auto data, which
> naturally anybody interested can try them themselves.
>
> Suppose we have manufacturer name and a classification
> of high or low mpg:
>
> . egen manuf = head(make)
> . gen himpg = mpg > 21
>
> Step 1. Calculate the frequencies you want displayed.
> Remember to negate them if you want them shown
> highest first.
>
> . bysort himpg manuf : gen freq = - _N
>
> Step 2. For each category of -himpg-,
> get the groups in the order defined by -freq- and -manuf-,
> and display the first 10 groups in each instance:
>
> . forval i = 0/1 {
> .	qui egen group`i' = egroup(freq manuf) if himpg ==
> `i' , l(manuf)
> .	tab group`i' if group`i' <= 10
> . }
>
> group(manuf |
>           ) |      Freq.     Percent        Cum.
> ------------+-----------------------------------
>       Buick |          6       16.67       16.67
>        Olds |          6       16.67       33.33
>       Merc. |          5       13.89       47.22
>       Pont. |          5       13.89       61.11
>        Cad. |          3        8.33       69.44
>       Dodge |          3        8.33       77.78
>       Linc. |          3        8.33       86.11
>       Chev. |          2        5.56       91.67
>      Toyota |          2        5.56       97.22
>         AMC |          1        2.78      100.00
> ------------+-----------------------------------
>       Total |         36      100.00
>
> group(manuf |
>           ) |      Freq.     Percent        Cum.
> ------------+-----------------------------------
>       Chev. |          4       17.39       17.39
>       Plym. |          4       17.39       34.78
>          VW |          4       17.39       52.17
>      Datsun |          3       13.04       65.22
>         AMC |          2        8.70       73.91
>       Honda |          2        8.70       82.61
>        Audi |          1        4.35       86.96
>         BMW |          1        4.35       91.30
>       Buick |          1        4.35       95.65
>       Dodge |          1        4.35      100.00
> ------------+-----------------------------------
>       Total |         23      100.00
>
> That could be improved a bit by putting in display
> lines.
>
> Now one question might fairly be, and this was
> what I thought of first, why not something more like
>
> . by himpg : egen group = egroup(freq manuf), l(manuf)
> . by himpg : tab group if group <= 10
>
> One answer is that -egroup()- does not support -by:-.
> An even better answer is that changing the program
> to support -by:- would run into an immediate problem
> that it can't be combined with allocation of value
> labels in the way that we want to allow output like
> that above.
>
> I'm sure that there are other ways to approach the
> problem.

Here's another, assuming

. egen manuf = head(make)

. gen himpg = mpg > 21

It uses no user-written extras. Nothing in this
assumes that the classifying variable has just
2 classes.

1. Create negated frequencies, to get proper sort
order.

. bysort himpg manuf : gen frequency = -_N

2. Calculate order explicitly:

. bysort himpg freq manuf : gen order = _n == 1
. by himpg : replace order = sum(order)

3. Flip frequencies back again:

. qui replace freq = - freq

4. Get your table:

. by himpg : tabdisp order if order <= 10, cell(manuf freq)

______________________________________________________________________
_________
-> himpg = 0

----------------------------------
    order |      manuf   frequency
----------+-----------------------
        1 |      Buick           6
        2 |       Olds           6
        3 |      Merc.           5
        4 |      Pont.           5
        5 |       Cad.           3
        6 |      Dodge           3
        7 |      Linc.           3
        8 |      Chev.           2
        9 |     Toyota           2
       10 |        AMC           1
----------------------------------

______________________________________________________________________
_________
-> himpg = 1

----------------------------------
    order |      manuf   frequency
----------+-----------------------
        1 |      Chev.           4
        2 |      Plym.           4
        3 |         VW           4
        4 |     Datsun           3
        5 |        AMC           2
        6 |      Honda           2
        7 |       Audi           1
        8 |        BMW           1
        9 |      Buick           1
       10 |      Dodge           1
----------------------------------

Nick
[email protected]

*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
References:
- st: RE: One last question about egen
  - From: "Nick Cox" <[email protected]>
Prev by Date: st: Elasticities
Next by Date: st: Re: Elasticities
Previous by thread: st: RE: One last question about egen
Next by thread: st: Imputing Mean of Top-Coded Income Category
Index(es):
- Date
- Thread