Statalist


[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

st: RE: Re: text manipulation of tabulate output


From   "Nick Cox" <n.j.cox@durham.ac.uk>
To   <statalist@hsphsun2.harvard.edu>
Subject   st: RE: Re: text manipulation of tabulate output
Date   Sun, 20 Apr 2008 17:44:31 +0100

In addition to other suggestions: 

-contract- lets you specify percents. 

After -summarize- r(mean) is a more direct way to get the mean than r(sum)/r(N). If you specify -summarize, meanonly- that is a good habit and the -quietly- is then unnecessary. 

There is no -version- statement in your program. -levels- is not a current command. At some point it may become available only under version control and then your program will be broken. 

-forval- makes this easier than -while-. 

Nick
n.j.cox@durham.ac.uk 

Jacob Wegelin

On Fri, 18 Apr 2008, Jacob Wegelin wrote:
>
> Suppose you have a list of categorical (qualitative) variables that are in 
> your data; each variable has some arbitrary number of categories; and you 
> want to produce a report, in text,
>
> - with one row for each variable and
>
> - a list of the percents in each category for each variable.
>
> The code below produces the following display for a set of variables:
>
> Paired_Biopsy_: number of categories=2; total nonmissing= 250; 81.2%, 18.8% 
> ALTcode: number of categories=3; total nonmissing= 250; 41.2%, 56%, 2.8% 
> Alcohol: number of categories=2; total nonmissing= 250; 75.2%, 24.8% 
> CDC_class: number of categories=8; total nonmissing= 161; 26.1%, 23.6%, 
> 11.2%, 8.7%, 4.3%, .6%, 1.9%, 23.6%
>
> BEGIN CODE
>
> local QualitVars ///
>         Paired_Biopsy_ ///
>         ALTcode ///
>         Alcohol ///
>         CDC_class ///
>
> display "`QualitVars'"
>
> tabulate CDC_class
>
> generate DUMMYjunk=0
>
> foreach THISVAR of varlist `QualitVars' ///
> {
>         display " "
>         display "`THISVAR'" ": " _continue
>         drop DUMMY*
>         quietly: tabulate `THISVAR', generate (DUMMY)
>         scalar nCategories=r(r)
>         scalar denominator=r(N)
>         display "number of categories=" nCategories "; total nonmissing= " 
> denominator "; " _continue
>
>         local index=0
>         while `index' < nCategories {
>                 local index=`index' + 1
>                 quietly: summarize DUMMY`index'
>                 scalar thispercent= round( 100* r(sum)/denominator, 0.1)
>                 display thispercent "%" _continue
>                 if `index' < nCategories {
>                         display ", " _continue
>                 }
>         }
>
> }
>
> END CODE
>
> Question Number One: Am I reinventing the wheel? Is there an easier way to do 
> this?
>
> Question Number Two: Is there a way to get the labels for the categories for 
> each variable?
>
> For instance, the labels for CDC_class are:
>
> . tabulate CDC_class
>
>   CDC_class |      Freq.     Percent        Cum.
> ------------+-----------------------------------
>          A1 |         42       26.09       26.09
>          A2 |         38       23.60       49.69
>          A3 |         18       11.18       60.87
>          B2 |         14        8.70       69.57
>          B3 |          7        4.35       73.91
>          C1 |          1        0.62       74.53
>          C2 |          3        1.86       76.40
>          C3 |         38       23.60      100.00
> ------------+-----------------------------------
>       Total |        161      100.00
>
> so that the output should really look like this:
>
> CDC_class:  number of categories=8; total nonmissing= 161; A1: 26.1%, A2: 
> 23.6%, A3: 11.2%, B2: 8.7%, B3: 4.3%, C1: .6%, C2: 1.9%, C3: 23.6%
>
> The format of the output of the tabulate command above, suggests that fancy 
> text manipulation (using perl, for instance) of that output would be a way to 
> eliminate the fancy loop above *and* to get the category labels. But is there 
> a more direct way?
>
> Thank you for any pointers

Here is progress on this question, i.e., a way to get the category labels into the output. This does not work if a categorical variable has a large number of categories; apparently there is a fairly small limit on how large a local macro can be.

/*** BEGIN CODE ***/

program drop _all
program manipulateTabOutput
 	args THISVAR
 	display "`THISVAR'"
 	quietly: levels `THISVAR'
 	local UNSORTEDLEVELS=r(levels)
 	local mylevs : list sort UNSORTEDLEVELS
 	gen DUMMYjunk=0
 	drop DUMMY*
 	quietly: tabulate `THISVAR', generate (DUMMY)
 	scalar nCategories=r(r)
 	local strPctBoth=""
 	local index=0
 	while `index' < nCategories {
 		local index=`index' + 1
 		local thisCatName : word `index' of `mylevs'
 		quietly: summarize DUMMY`index'
 		scalar thispercent= round( 100* r(sum)/r(N), 0.1)
 		local strPctBoth= "`strPctBoth'" + "`thisCatName': " + string(thispercent, "%9.1g") + "%"
 		if `index' < nCategories {
 			local strPctBoth= "`strPctBoth'" + ", "
 			}
 		}
 	display "`strPctBoth'"
end

webuse choice, clear
manipulateTabOutput sex
manipulateTabOutput size
* The following fails because there are so many dealers:
manipulateTabOutput dealer

/*** END CODE ***/

>
> Jake
>
> Jacob A. Wegelin
> jwegelin@vcu.edu Assistant Professor
> Department of Biostatistics
> Virginia Commonwealth University
> 730 East Broad Street Room 3006
> P. O. Box 980032
> Richmond VA 23298-0032
> U.S.A. http://www.people.vcu.edu/~jwegelin
>

*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index