Stata The Stata listserver
[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

st: RE: Question about tabsort


From   "Nick Cox" <n.j.cox@durham.ac.uk>
To   <statalist@hsphsun2.harvard.edu>
Subject   st: RE: Question about tabsort
Date   Fri, 12 Jul 2002 17:11:40 +0100

Rodrigo Briceno 

> Hi, I need some information: I'm using tabsort in order to obtain 
> the most frequently diagnoses in a hospital.  I want that Stata 
> only presents me the first 10 diagnoses, and I think that I can 
> do that by typing
> 
> Tabsort clave1 in 1/10, but Stata showed me another thing:
> 
> tabsort clave1 in 1/10
> 
>      clave1 |      Freq.     Percent        Cum.
> ------------+-----------------------------------
>        O809 |         10      100.00      100.00
> ------------+-----------------------------------
>       Total |         10      100.00
> 
> Can somebody explain me what is the correct procedure to obtain 
> what I want?
> It is possible to assign to this ten codes the description of the 
> CIE-10 that correspond by making some formula in Stata?
 
-tabsort- is a user-written command downloadable 
as part of the -tab_chi- package from SSC. 

Incidentally, -tabsort- really is an awful kludge. 
What it does is oblige -tabulate- to produce results 
once, quietly; it then works on those results and 
finally gets -tabulate- to emit them once more in sorted form. 
The approach I discussed yesterday on Statalist is, I believe, 
often much better. 

First, to explain what Rodrigo got. -tabsort- is here 
working in a standard Stata way, namely 

in 1/10 

selects the first 10 observations and -tabsort- 
then shows a table for those. Seemingly, in Rodrigo's case 
they are all the same. 

I think he wants -tabsort- to produce a table only 
for the 10 most common entries, which is a different 
problem, and one not soluble with -tabsort- alone. 

That is, Rodrigo wants to select -in 1/10- _within_ 
his table, but that is not how -in- works. 

I don't understand what CIE-10 means. 

Translating to an auto data problem: let's 
define 

egen manuf = head(make) 

and say we want a table of the ten most common manufacturers. 

First we compute frequencies directly: 

. bysort manuf : gen freq = _N 

Each frequency will appear repeatedly for each manuf 
represented more than once, whereas we only want to 
see each frequency once. One way is to tag just 
one observation in each group: 

. egen tag = tag(manuf)

-egen- haters would prefer 

. by manuf : gen tag = _n == 1  

Now we sort first on  selected observations and then 
on (negated) frequencies: 

. gsort - tag - freq 

That way, what we want is at the start of the data
set. Now generate a rank order variable

. gen order = _n 

and produce our own table directly: 

. tabdisp order in 1/10 if tag, c(manuf freq) 

----------------------------------
    order |      manuf        freq
----------+-----------------------
        1 |       Olds           7
        2 |      Buick           7
        3 |      Chev.           6
        4 |      Pont.           6
        5 |      Merc.           6
        6 |      Plym.           5
        7 |     Datsun           4
        8 |      Dodge           4
        9 |         VW           4
       10 |        AMC           3
----------------------------------

Let's hope Rodrigo's table is more interesting. 
His code is similar, if I'm understanding properly:  

bysort clave1 : gen freq = _N 
egen tag = tag(clave1) 
gsort - tag - freq 
gen order = _n 
tabdisp order in 1/10 if tag, c(clave1 freq) 

Nick 
n.j.cox@durham.ac.uk 


<<attachment: winmail.dat>>




© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index