Statalist


[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

st: text manipulation of tabulate output


From   Jacob Wegelin <[email protected]>
To   [email protected]
Subject   st: text manipulation of tabulate output
Date   Fri, 18 Apr 2008 16:45:29 -0400 (EDT)


Suppose you have a list of categorical (qualitative) variables that are in your data; each variable has some arbitrary number of categories; and you want to produce a report, in text,

- with one row for each variable and

- a list of the percents in each category for each variable.

The code below produces the following display for a set of variables:

Paired_Biopsy_: number of categories=2; total nonmissing= 250; 81.2%, 18.8% ALTcode: number of categories=3; total nonmissing= 250; 41.2%, 56%, 2.8% Alcohol: number of categories=2; total nonmissing= 250; 75.2%, 24.8% CDC_class: number of categories=8; total nonmissing= 161; 26.1%, 23.6%, 11.2%, 8.7%, 4.3%, .6%, 1.9%, 23.6%

BEGIN CODE

local QualitVars ///
        Paired_Biopsy_ ///
        ALTcode ///
        Alcohol ///
        CDC_class ///

display "`QualitVars'"

tabulate CDC_class

generate DUMMYjunk=0

foreach THISVAR of varlist `QualitVars' ///
{
        display " "
        display "`THISVAR'" ": " _continue
        drop DUMMY*
        quietly: tabulate `THISVAR', generate (DUMMY)
        scalar nCategories=r(r)
        scalar denominator=r(N)
        display "number of categories=" nCategories "; total nonmissing= " denominator "; " _continue

        local index=0
        while `index' < nCategories {
                local index=`index' + 1
                quietly: summarize DUMMY`index'
                scalar thispercent= round( 100* r(sum)/denominator, 0.1)
                display thispercent "%" _continue
                if `index' < nCategories {
                        display ", " _continue
                }
        }

}

END CODE

Question Number One: Am I reinventing the wheel? Is there an easier way to do this?

Question Number Two: Is there a way to get the labels for the categories for each variable?

For instance, the labels for CDC_class are:

. tabulate CDC_class

  CDC_class |      Freq.     Percent        Cum.
------------+-----------------------------------
         A1 |         42       26.09       26.09
         A2 |         38       23.60       49.69
         A3 |         18       11.18       60.87
         B2 |         14        8.70       69.57
         B3 |          7        4.35       73.91
         C1 |          1        0.62       74.53
         C2 |          3        1.86       76.40
         C3 |         38       23.60      100.00
------------+-----------------------------------
      Total |        161      100.00

so that the output should really look like this:

CDC_class: number of categories=8; total nonmissing= 161; A1: 26.1%, A2: 23.6%, A3: 11.2%, B2: 8.7%, B3: 4.3%, C1: .6%, C2: 1.9%, C3: 23.6%

The format of the output of the tabulate command above, suggests that fancy text manipulation (using perl, for instance) of that output would be a way to eliminate the fancy loop above *and* to get the category labels. But is there a more direct way?

Thank you for any pointers

Jake

Jacob A. Wegelin
[email protected] Assistant Professor
Department of Biostatistics
Virginia Commonwealth University
730 East Broad Street Room 3006
P. O. Box 980032
Richmond VA 23298-0032
U.S.A. http://www.people.vcu.edu/~jwegelin




© Copyright 1996–2024 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index