[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

st: text manipulation of tabulate output

From	Jacob Wegelin <[email protected]>
To	[email protected]
Subject	st: text manipulation of tabulate output
Date	Fri, 18 Apr 2008 16:45:29 -0400 (EDT)

Suppose you have a list of categorical (qualitative) variables that are in your data; each variable has some arbitrary number of categories; and you want to produce a report, in text,

- with one row for each variable and

- a list of the percents in each category for each variable.

The code below produces the following display for a set of variables:

Paired_Biopsy_: number of categories=2; total nonmissing= 250; 81.2%, 18.8% ALTcode: number of categories=3; total nonmissing= 250; 41.2%, 56%, 2.8% Alcohol: number of categories=2; total nonmissing= 250; 75.2%, 24.8% CDC_class: number of categories=8; total nonmissing= 161; 26.1%, 23.6%, 11.2%, 8.7%, 4.3%, .6%, 1.9%, 23.6%

BEGIN CODE

local QualitVars ///
   Paired_Biopsy_ ///
   ALTcode ///
   Alcohol ///
   CDC_class ///

display "`QualitVars'"

tabulate CDC_class

generate DUMMYjunk=0

foreach THISVAR of varlist `QualitVars' ///
{
   display " "
   display "`THISVAR'" ": " _continue
   drop DUMMY*
   quietly: tabulate `THISVAR', generate (DUMMY)
   scalar nCategories=r(r)
   scalar denominator=r(N)
   display "number of categories=" nCategories "; total nonmissing= " denominator "; " _continue

   local index=0
   while `index' < nCategories {
   local index=`index' + 1
   quietly: summarize DUMMY`index'
   scalar thispercent= round( 100* r(sum)/denominator, 0.1)
   display thispercent "%" _continue
   if `index' < nCategories {
   display ", " _continue
   }
   }

}

END CODE

Question Number One: Am I reinventing the wheel? Is there an easier way to do this?

Question Number Two: Is there a way to get the labels for the categories for each variable?

For instance, the labels for CDC_class are:

. tabulate CDC_class

  CDC_class | Freq. Percent Cum.
------------+-----------------------------------
   A1 | 42 26.09 26.09
   A2 | 38 23.60 49.69
   A3 | 18 11.18 60.87
   B2 | 14 8.70 69.57
   B3 | 7 4.35 73.91
   C1 | 1 0.62 74.53
   C2 | 3 1.86 76.40
   C3 | 38 23.60 100.00
------------+-----------------------------------
   Total | 161 100.00

so that the output should really look like this:

CDC_class: number of categories=8; total nonmissing= 161; A1: 26.1%, A2: 23.6%, A3: 11.2%, B2: 8.7%, B3: 4.3%, C1: .6%, C2: 1.9%, C3: 23.6%

The format of the output of the tabulate command above, suggests that fancy text manipulation (using perl, for instance) of that output would be a way to eliminate the fancy loop above *and* to get the category labels. But is there a more direct way?

Thank you for any pointers

Jake

Jacob A. Wegelin
[email protected] Assistant Professor
Department of Biostatistics
Virginia Commonwealth University
730 East Broad Street Room 3006
P. O. Box 980032
Richmond VA 23298-0032
U.S.A. http://www.people.vcu.edu/~jwegelin

Follow-Ups:
- Re: st: text manipulation of tabulate output
  - From: "Gabi Huiber" <[email protected]>
- st: Re: text manipulation of tabulate output
  - From: Jacob Wegelin <[email protected]>

Prev by Date: st: RE: Hebrew labels
Next by Date: st: seeking text that explains how to program
Previous by thread: st: RE: Hebrew labels
Next by thread: st: Re: text manipulation of tabulate output
Index(es):
- Date
- Thread