st: Re: text manipulation of tabulate output

 From Jacob Wegelin To statalist@hsphsun2.harvard.edu Subject st: Re: text manipulation of tabulate output Date Sat, 19 Apr 2008 03:30:27 -0400 (EDT)

```On Fri, 18 Apr 2008, Jacob Wegelin wrote:
```

Suppose you have a list of categorical (qualitative) variables that are in your data; each variable has some arbitrary number of categories; and you want to produce a report, in text,

- with one row for each variable and

- a list of the percents in each category for each variable.

The code below produces the following display for a set of variables:

Paired_Biopsy_: number of categories=2; total nonmissing= 250; 81.2%, 18.8% ALTcode: number of categories=3; total nonmissing= 250; 41.2%, 56%, 2.8% Alcohol: number of categories=2; total nonmissing= 250; 75.2%, 24.8% CDC_class: number of categories=8; total nonmissing= 161; 26.1%, 23.6%, 11.2%, 8.7%, 4.3%, .6%, 1.9%, 23.6%

BEGIN CODE

local QualitVars ///
Paired_Biopsy_ ///
ALTcode ///
Alcohol ///
CDC_class ///

display "`QualitVars'"

tabulate CDC_class

generate DUMMYjunk=0

foreach THISVAR of varlist `QualitVars' ///
{
display " "
display "`THISVAR'" ": " _continue
drop DUMMY*
quietly: tabulate `THISVAR', generate (DUMMY)
scalar nCategories=r(r)
scalar denominator=r(N)
display "number of categories=" nCategories "; total nonmissing= " denominator "; " _continue

local index=0
while `index' < nCategories {
local index=`index' + 1
quietly: summarize DUMMY`index'
scalar thispercent= round( 100* r(sum)/denominator, 0.1)
display thispercent "%" _continue
if `index' < nCategories {
display ", " _continue
}
}

}

END CODE

Question Number One: Am I reinventing the wheel? Is there an easier way to do this?

Question Number Two: Is there a way to get the labels for the categories for each variable?

For instance, the labels for CDC_class are:

. tabulate CDC_class

CDC_class |      Freq.     Percent        Cum.
------------+-----------------------------------
A1 |         42       26.09       26.09
A2 |         38       23.60       49.69
A3 |         18       11.18       60.87
B2 |         14        8.70       69.57
B3 |          7        4.35       73.91
C1 |          1        0.62       74.53
C2 |          3        1.86       76.40
C3 |         38       23.60      100.00
------------+-----------------------------------
Total |        161      100.00

so that the output should really look like this:

CDC_class: number of categories=8; total nonmissing= 161; A1: 26.1%, A2: 23.6%, A3: 11.2%, B2: 8.7%, B3: 4.3%, C1: .6%, C2: 1.9%, C3: 23.6%

The format of the output of the tabulate command above, suggests that fancy text manipulation (using perl, for instance) of that output would be a way to eliminate the fancy loop above *and* to get the category labels. But is there a more direct way?

Thank you for any pointers
```Here is progress on this question, i.e., a way to get the category labels into the output. This does not work if a categorical variable has a large number of categories; apparently there is a fairly small limit on how large a local macro can be.

/*** BEGIN CODE ***/

program drop _all
args THISVAR
display "`THISVAR'"
quietly: levels `THISVAR'
local UNSORTEDLEVELS=r(levels)
local mylevs : list sort UNSORTEDLEVELS
gen DUMMYjunk=0
drop DUMMY*
quietly: tabulate `THISVAR', generate (DUMMY)
scalar nCategories=r(r)
local strPctBoth=""
local index=0
while `index' < nCategories {
local index=`index' + 1
local thisCatName : word `index' of `mylevs'
quietly: summarize DUMMY`index'
scalar thispercent= round( 100* r(sum)/r(N), 0.1)
local strPctBoth= "`strPctBoth'" + "`thisCatName': " + string(thispercent, "%9.1g") + "%"
if `index' < nCategories {
local strPctBoth= "`strPctBoth'" + ", "
}
}
display "`strPctBoth'"
end

webuse choice, clear
* The following fails because there are so many dealers:

/*** END CODE ***/

```
```Jake

Jacob A. Wegelin
jwegelin@vcu.edu Assistant Professor
Department of Biostatistics
Virginia Commonwealth University
730 East Broad Street Room 3006
P. O. Box 980032
Richmond VA 23298-0032
U.S.A. http://www.people.vcu.edu/~jwegelin
```