Stata The Stata listserver
[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

st: RE: One last question about egen


From   "Nick Cox" <n.j.cox@durham.ac.uk>
To   <statalist@hsphsun2.harvard.edu>
Subject   st: RE: One last question about egen
Date   Sun, 14 Jul 2002 17:01:43 +0100

Rodrigo Briceņo

[message edited; winmail.dat zapped, etc.]

Following with my previous doubts:  I have a hospital discharges
database,
and two of the variables from the list are:

-diaest1- and -clave1-.

i already processed the data to find the 10 most frequently diagnoses
with
the help of -egen, group()-. What do I need to do if I want the same
thing,
but this time I want to separate the variable -diaest1-. Let's say
that I need
the first 10 diagnoses for the discharges that have a duration of 6 or
more
days, and the first 10 diagnoses for the discharges that have a
duration
fewer than 2 days. I already make a variable with establish those
durations
(called -rank_estancia2-).

rank_estancia2=1 (diaest <2 days)
rank_estancia2=2 (diaest 2-5 days)
rank_estancia2=1 (diaest 6 or more days)

I tried to do something with -egen, group()- but my tries didn't seem
to be
useful. I already tried typing:

tabsort clave1 if rank_estancia2==1 & group<11

(where group being the variable calculated for the first answer
of the day to this list and Nick Cox help me to build).

Sorry for my ignorance.

>>> I don't know how to do this cleanly with official
Stata's -egen, group()- as mentioned by Rodrigo.

Once more I will show a way to do something like this
with my own -egroup()- function for -egen-, accessible
as part of the -egenmore- package on SSC.

Without access to Rodrigo's data this is easier to
explain with an analogue for the auto data, which
naturally anybody interested can try them themselves.

Suppose we have manufacturer name and a classification
of high or low mpg:

. egen manuf = head(make)
. gen himpg = mpg > 21

Step 1. Calculate the frequencies you want displayed.
Remember to negate them if you want them shown
highest first.

. bysort himpg manuf : gen freq = - _N

Step 2. For each category of -himpg-,
get the groups in the order defined by -freq- and -manuf-,
and display the first 10 groups in each instance:

. forval i = 0/1 {
.	qui egen group`i' = egroup(freq manuf) if himpg == `i' , l(manuf)
.	tab group`i' if group`i' <= 10
. }

group(manuf |
          ) |      Freq.     Percent        Cum.
------------+-----------------------------------
      Buick |          6       16.67       16.67
       Olds |          6       16.67       33.33
      Merc. |          5       13.89       47.22
      Pont. |          5       13.89       61.11
       Cad. |          3        8.33       69.44
      Dodge |          3        8.33       77.78
      Linc. |          3        8.33       86.11
      Chev. |          2        5.56       91.67
     Toyota |          2        5.56       97.22
        AMC |          1        2.78      100.00
------------+-----------------------------------
      Total |         36      100.00

group(manuf |
          ) |      Freq.     Percent        Cum.
------------+-----------------------------------
      Chev. |          4       17.39       17.39
      Plym. |          4       17.39       34.78
         VW |          4       17.39       52.17
     Datsun |          3       13.04       65.22
        AMC |          2        8.70       73.91
      Honda |          2        8.70       82.61
       Audi |          1        4.35       86.96
        BMW |          1        4.35       91.30
      Buick |          1        4.35       95.65
      Dodge |          1        4.35      100.00
------------+-----------------------------------
      Total |         23      100.00

That could be improved a bit by putting in display
lines.

Now one question might fairly be, and this was
what I thought of first, why not something more like

. by himpg : egen group = egroup(freq manuf), l(manuf)
. by himpg : tab group if group <= 10

One answer is that -egroup()- does not support -by:-.
An even better answer is that changing the program
to support -by:- would run into an immediate problem
that it can't be combined with allocation of value
labels in the way that we want to allow output like
that above.

I'm sure that there are other ways to approach the
problem.

P.S. -tabsort- is a red herring here. Once you have
generated the variable to be tabulated in such
a way that it will automatically be tabulated in
the sort order you want, then -tabsort- is no
longer needed.

Nick
n.j.cox@durham.ac.uk

*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index