[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

From |
markeboye@msn.com |

To |
statalist@hsphsun2.harvard.edu |

Subject |
Re: st: text manipulation of tabulate output |

Date |
Sat, 17 May 2008 16:56:12 +0000 |

Sent from my BlackBerry® wireless device -----Original Message----- From: "Gabi Huiber" <ghuiber@gmail.com> Date: Sat, 19 Apr 2008 13:53:10 To:statalist@hsphsun2.harvard.edu Subject: Re: st: text manipulation of tabulate output file and writing another, or some regular expression matching -- I used to do in PHP. But Stata can do all that. It has fopen read and write, and it has quite extensive regex capabilities. But the most direct way, I think, would be to suitably (and perhaps temporarily, using "preserve/restore") "collapse (sum)" or "collapse (mean)" your data. This would produce a smaller dataset with only the summary of interest, which you can "outsheet" to whatever format your Stata-deprived audience finds acceptable. Gabi On 4/18/08, Jacob Wegelin <jwegelin@vcu.edu> wrote: > > Suppose you have a list of categorical (qualitative) variables that are in > your data; each variable has some arbitrary number of categories; and you > want to produce a report, in text, > > - with one row for each variable and > > - a list of the percents in each category for each variable. > > The code below produces the following display for a set of variables: > > Paired_Biopsy_: number of categories=2; total nonmissing= 250; 81.2%, 18.8% > ALTcode: number of categories=3; total nonmissing= 250; 41.2%, 56%, 2.8% > Alcohol: number of categories=2; total nonmissing= 250; 75.2%, 24.8% > CDC_class: number of categories=8; total nonmissing= 161; 26.1%, 23.6%, > 11.2%, 8.7%, 4.3%, .6%, 1.9%, 23.6% > > BEGIN CODE > > local QualitVars /// > Paired_Biopsy_ /// > ALTcode /// > Alcohol /// > CDC_class /// > > display "`QualitVars'" > > tabulate CDC_class > > generate DUMMYjunk=0 > > foreach THISVAR of varlist `QualitVars' /// > { > display " " > display "`THISVAR'" ": " _continue > drop DUMMY* > quietly: tabulate `THISVAR', generate (DUMMY) > scalar nCategories=r(r) > scalar denominator=r(N) > display "number of categories=" nCategories "; total nonmissing= " > denominator "; " _continue > > local index=0 > while `index' < nCategories { > local index=`index' + 1 > quietly: summarize DUMMY`index' > scalar thispercent= round( 100* r(sum)/denominator, 0.1) > display thispercent "%" _continue > if `index' < nCategories { > display ", " _continue > } > } > > } > > END CODE > > Question Number One: Am I reinventing the wheel? Is there an easier way to > do this? > > Question Number Two: Is there a way to get the labels for the categories > for each variable? > > For instance, the labels for CDC_class are: > > . tabulate CDC_class > > CDC_class | Freq. Percent Cum. > ------------+----------------------------------- > A1 | 42 26.09 26.09 > A2 | 38 23.60 49.69 > A3 | 18 11.18 60.87 > B2 | 14 8.70 69.57 > B3 | 7 4.35 73.91 > C1 | 1 0.62 74.53 > C2 | 3 1.86 76.40 > C3 | 38 23.60 100.00 > ------------+----------------------------------- > Total | 161 100.00 > > so that the output should really look like this: > > CDC_class: number of categories=8; total nonmissing= 161; A1: 26.1%, A2: > 23.6%, A3: 11.2%, B2: 8.7%, B3: 4.3%, C1: .6%, C2: 1.9%, C3: 23.6% > > The format of the output of the tabulate command above, suggests that fancy > text manipulation (using perl, for instance) of that output would be a way > to eliminate the fancy loop above *and* to get the category labels. But is > there a more direct way? > > Thank you for any pointers > > Jake > > Jacob A. Wegelin > jwegelin@vcu.edu Assistant Professor > Department of Biostatistics > Virginia Commonwealth University > 730 East Broad Street Room 3006 > P. O. Box 980032 > Richmond VA 23298-0032 > U.S.A. http://www.people.vcu.edu/~jwegelin > * * For searches and help try: * http://www.stata.com/support/faqs/res/findit.html * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/ * * For searches and help try: * http://www.stata.com/support/faqs/res/findit.html * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

- Prev by Date:
**st: McDonald and Moffitt’s (1980) decompo sition method for Tobit coefficients.** - Next by Date:
**st: showing the mean in histogram** - Previous by thread:
**st: McDonald and Moffitt’s (1980) decompo sition method for Tobit coefficients.** - Next by thread:
**st: showing the mean in histogram** - Index(es):

© Copyright 1996–2014 StataCorp LP | Terms of use | Privacy | Contact us | What's new | Site index |