Statalist


[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

st: RE: Re: text manipulation of tabulate output


From   "Nick Cox" <[email protected]>
To   <[email protected]>
Subject   st: RE: Re: text manipulation of tabulate output
Date   Sun, 20 Apr 2008 17:44:31 +0100

In addition to other suggestions: 

-contract- lets you specify percents. 

After -summarize- r(mean) is a more direct way to get the mean than r(sum)/r(N). If you specify -summarize, meanonly- that is a good habit and the -quietly- is then unnecessary. 

There is no -version- statement in your program. -levels- is not a current command. At some point it may become available only under version control and then your program will be broken. 

-forval- makes this easier than -while-. 

Nick
[email protected] 

Jacob Wegelin

On Fri, 18 Apr 2008, Jacob Wegelin wrote:
>
> Suppose you have a list of categorical (qualitative) variables that are in 
> your data; each variable has some arbitrary�number of categories; and you 
> want to produce a report, in text,
>
> - with one row for each variable and
>
> - a list of the percents in each category for each variable.
>
> The code below produces the following display for a set of variables:
>
> Paired_Biopsy_: number of categories=2; total nonmissing= 250; 81.2%, 18.8% 
> ALTcode: number of categories=3; total nonmissing= 250; 41.2%, 56%, 2.8% 
> Alcohol: number of categories=2; total nonmissing= 250; 75.2%, 24.8% 
> CDC_class: number of categories=8; total nonmissing= 161; 26.1%, 23.6%, 
> 11.2%, 8.7%, 4.3%, .6%, 1.9%, 23.6%
>
> BEGIN CODE
>
> local QualitVars ///
> �� � � �Paired_Biopsy_ ///
> �� � � �ALTcode ///
> �� � � �Alcohol ///
> �� � � �CDC_class ///
>
> display "`QualitVars'"
>
> tabulate CDC_class
>
> generate DUMMYjunk=0
>
> foreach THISVAR of varlist `QualitVars' ///
> {
> �� � � �display " "
> �� � � �display "`THISVAR'" ": " _continue
> �� � � �drop DUMMY*
> �� � � �quietly: tabulate `THISVAR', generate (DUMMY)
> �� � � �scalar nCategories=r(r)
> �� � � �scalar denominator=r(N)
> �� � � �display "number of categories=" nCategories "; total nonmissing= " 
> denominator "; " _continue
>
> �� � � �local index=0
> �� � � �while `index' < nCategories {
> �� � � � � � � �local index=`index' + 1
> �� � � � � � � �quietly: summarize DUMMY`index'
> �� � � � � � � �scalar thispercent= round( 100* r(sum)/denominator, 0.1)
> �� � � � � � � �display thispercent "%" _continue
> �� � � � � � � �if `index' < nCategories {
> �� � � � � � � � � � � �display ", " _continue
> �� � � � � � � �}
> �� � � �}
>
> }
>
> END CODE
>
> Question Number One: Am I reinventing the wheel? Is there an easier way to do 
> this?
>
> Question Number Two: Is there a way to get the labels for the categories for 
> each variable?
>
> For instance, the labels for CDC_class are:
>
> . tabulate CDC_class
>
> ��CDC_class | � � �Freq. � � Percent � � � �Cum.
> ------------+-----------------------------------
> �� � � � A1 | � � � � 42 � � � 26.09 � � � 26.09
> �� � � � A2 | � � � � 38 � � � 23.60 � � � 49.69
> �� � � � A3 | � � � � 18 � � � 11.18 � � � 60.87
> �� � � � B2 | � � � � 14 � � � �8.70 � � � 69.57
> �� � � � B3 | � � � � �7 � � � �4.35 � � � 73.91
> �� � � � C1 | � � � � �1 � � � �0.62 � � � 74.53
> �� � � � C2 | � � � � �3 � � � �1.86 � � � 76.40
> �� � � � C3 | � � � � 38 � � � 23.60 � � �100.00
> ------------+-----------------------------------
> �� � �Total | � � � �161 � � �100.00
>
> so that the output should really look like this:
>
> CDC_class:  number of categories=8; total nonmissing= 161; A1: 26.1%, A2: 
> 23.6%, A3: 11.2%, B2: 8.7%, B3: 4.3%, C1: .6%, C2: 1.9%, C3: 23.6%
>
> The format of the output of the tabulate command above, suggests that fancy 
> text manipulation (using perl, for instance) of that output would be a way to 
> eliminate the fancy loop above *and* to get the category labels. But is there 
> a more direct way?
>
> Thank you for any pointers

Here is progress on this question, i.e., a way to get the category labels into the output. This does not work if a categorical variable has a large number of categories; apparently there is a fairly small limit on how large a local macro can be.

/*** BEGIN CODE ***/

program drop _all
program manipulateTabOutput
 	args THISVAR
 	display "`THISVAR'"
 	quietly: levels `THISVAR'
 	local UNSORTEDLEVELS=r(levels)
 	local mylevs : list sort UNSORTEDLEVELS
 	gen DUMMYjunk=0
 	drop DUMMY*
 	quietly: tabulate `THISVAR', generate (DUMMY)
 	scalar nCategories=r(r)
 	local strPctBoth=""
 	local index=0
 	while `index' < nCategories {
 		local index=`index' + 1
 		local thisCatName : word `index' of `mylevs'
 		quietly: summarize DUMMY`index'
 		scalar thispercent= round( 100* r(sum)/r(N), 0.1)
 		local strPctBoth= "`strPctBoth'" + "`thisCatName': " + string(thispercent, "%9.1g") + "%"
 		if `index' < nCategories {
 			local strPctBoth= "`strPctBoth'" + ", "
 			}
 		}
 	display "`strPctBoth'"
end

webuse choice, clear
manipulateTabOutput sex
manipulateTabOutput size
* The following fails because there are so many dealers:
manipulateTabOutput dealer

/*** END CODE ***/

>
> Jake
>
> Jacob A. Wegelin
> [email protected] Assistant Professor
> Department of Biostatistics
> Virginia Commonwealth University
> 730 East Broad Street Room 3006
> P. O. Box 980032
> Richmond VA 23298-0032
> U.S.A. http://www.people.vcu.edu/~jwegelin
>

*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2024 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index