Stata The Stata listserver
[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

st: RE: Analyzing multiple response variables with multiple categories


From   "Nick Cox" <[email protected]>
To   <[email protected]>
Subject   st: RE: Analyzing multiple response variables with multiple categories
Date   Thu, 20 Feb 2003 16:40:55 -0000

Dr. Hildegard Schaeper
>
> I am a Stata beginner (we just banished SPSS), nonetheless,
> I have to
> solve some problems in order to satisfy the needs of my institute.
>
> The problem: We often have to analyze a set of multiple response
> variables that all have several and exactly the same codes,
> e.g. courses
> students are enrolled in. Let's assume that our
> interviewees can give up
> to four answers. The variables named answer1, answer1,
> answer3, answer4
> are coded in the same way, e.g.:
>
> 1 = mathematics
>
> 2 = philosophy
>
> 3 = chemistry
>
> etc.

At first sight Stata offers little in this area. However,
there is an FAQ on multiple responses at
http://www.stata.com/support/faqs/data/multresp.html.
In modified form it will appear in Stata Journal 3(1)
2003 as a column by Ulrich Kohler and myself.

> How can I get a frequency distribution and a percentage distribution
> (based on cases, not on answers), which takes into account
> analytical
> weights and which informs me about how many respondents (absolute
> numbers and percentages) are enrolled in mathematics,
> philosophy etc.?
> In short, I was looking for a Stata command that resembles the SPSS
> "mult response" command (sorry, but this feature of SPSS really is
> useful).

-tabm- in module -tab_chi- on SSC is one answer, I believe.
For other programs in this territory, please see the FAQ.

> Because I didn't succeed in finding a Stata ado which
> satisfies all my
> needs I began to do some programming. My idea was to
> generate a set of
> dummy variables, which represent each of the categories of
> the original
> variables, and then simply to compute the mean using the
> tabstat command
> (which allows for the by prefix, for the by option and
> weights, so that
> I even can produce multidimensional percentage
> distributions). Eureka,
> the program works, but has three disadvantages: First, the
> program is
> very slow, because, depending on the number of categories, a lot of
> dummy variables are to be generated (in my application 99). Second,
> instead of displaying the labels of the values of the
> original variables
> only the names of the newly created dummy variables are displayed. I
> succeeded in assigning the value labels of the original
> variables to the
> dummy variables, but I don't know how to tell my program
> that I want the
> variable labels to be displayed and not the variable names.
> Third, only
> means (i.e. percentages) are displayed, not frequencies.
>
> Here's my program. Can anybody give me an advice? Thanks a lot.
>
> Hilde
>
> /* beginning of the program */
> program define mrtab, byable(recall)
>
> version 8
> syntax varlist [if] [fweight aweight iweight] [, by(varname)]
>
> preserve
> marksample touse
>
> if "`exp'" ~= "" {
> tempvar wt
> gen `wt' `exp'
> local w "[`weight' = `wt']"
> }
> /* computing the maximum value of the variables */
> tempvar max1
> egen `max1'=rmax(`varlist')
> tempvar max2
> egen `max2'=max(`max1')
> local maxval=`max2'
>
> /* generating the set of dummy variables */
> forvalues i = 1/`maxval' {
> egen resp`i' = eqany(`varlist'), v(`i')
> }
>
> /* multiplication by 100 in order to get percentages */
> local i 1
> while `i' <= `maxval' {
> quietly replace resp`i' = resp`i' * 100
> local i = `i' + 1
> }
>
> /* assigning the value labels of the original variables */
> /* to the dummy variables */
> tokenize `varlist'
> local j = 1
> forvalues i = 1/`maxval' {
> local labval`j' : label `1' `i'
> local j = `j' + 1
> }
>
> local i 1
> local j 1
> while `i' == `j' & `i' <= `maxval' {
> label variable resp`i' "`labval`j''"
> local i = `i' + 1
> local j = `j' + 1
> }
>
> /* elimination of dummy variables which only have zeros */
> forvalues i = 1/`maxval' {
> egen m`i'= max(resp`i')
> }
> forvalues i = 1/`maxval' {
> if m`i' == 0 {
> drop resp`i'
> }
> }
>
>
> if `"`by'"' == "" {
> tabstat resp1-resp`maxval' if `touse' `w', stat(mean count)
> format(%3.1f) column(statistics) longstub
> }
> else if `"`by'"' ~= "" {
> tabstat resp1-resp`maxval' if `touse' `w', stat(mean count)
> format(%3.1f) col(stat) by(`by') long
> }
>
>
> end

I can suggest ways of speeding this up, but I suggest
that you come back with a specification of what
existing programs cannot do in this area but which
you wish to do.

I'll add one tip (from Ulrich Kohler) not in the FAQ
but mentioned in the paper. In some circumstances it is
a good idea to -reshape- to long, to -tsset- your
data as panel data (yes! it's a lie) and then use -xttab-.

Nick
[email protected]

*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2024 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index