[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

st: Analyzing multiple response variables with multiple categories

From	"Schaeper, Dr. Hildegard" <[email protected]>
To	<[email protected]>
Subject	st: Analyzing multiple response variables with multiple categories
Date	Thu, 20 Feb 2003 13:33:07 +0100

Dear all,

I am a Stata beginner (we just banished SPSS), nonetheless, I have to
solve some problems in order to satisfy the needs of my institute.

The problem: We often have to analyze a set of multiple response
variables that all have several and exactly the same codes, e.g. courses
students are enrolled in. Let's assume that our interviewees can give up
to four answers. The variables named answer1, answer1, answer3, answer4
are coded in the same way, e.g.:

1 = mathematics

2 = philosophy

3 = chemistry

etc.

How can I get a frequency distribution and a percentage distribution
(based on cases, not on answers), which takes into account analytical
weights and which informs me about how many respondents (absolute
numbers and percentages) are enrolled in mathematics, philosophy etc.?
In short, I was looking for a Stata command that resembles the SPSS
"mult response" command (sorry, but this feature of SPSS really is
useful).

Because I didn't succeed in finding a Stata ado which satisfies all my
needs I began to do some programming. My idea was to generate a set of
dummy variables, which represent each of the categories of the original
variables, and then simply to compute the mean using the tabstat command
(which allows for the by prefix, for the by option and weights, so that
I even can produce multidimensional percentage distributions). Eureka,
the program works, but has three disadvantages: First, the program is
very slow, because, depending on the number of categories, a lot of
dummy variables are to be generated (in my application 99). Second,
instead of displaying the labels of the values of the original variables
only the names of the newly created dummy variables are displayed. I
succeeded in assigning the value labels of the original variables to the
dummy variables, but I don't know how to tell my program that I want the
variable labels to be displayed and not the variable names. Third, only
means (i.e. percentages) are displayed, not frequencies.

Here's my program. Can anybody give me an advice? Thanks a lot.

Hilde

/* beginning of the program */
program define mrtab, byable(recall)

version 8
syntax varlist [if] [fweight aweight iweight] [, by(varname)]

preserve
marksample touse

if "`exp'" ~= "" {
tempvar wt
gen `wt' `exp'
local w "[`weight' = `wt']"
}
/* computing the maximum value of the variables */
tempvar max1
egen `max1'=rmax(`varlist')
tempvar max2
egen `max2'=max(`max1')
local maxval=`max2'

/* generating the set of dummy variables */
forvalues i = 1/`maxval' {
egen resp`i' = eqany(`varlist'), v(`i')
}

/* multiplication by 100 in order to get percentages */
local i 1
while `i' <= `maxval' {
quietly replace resp`i' = resp`i' * 100
local i = `i' + 1
}

/* assigning the value labels of the original variables */
/* to the dummy variables */
tokenize `varlist'
local j = 1
forvalues i = 1/`maxval' {
local labval`j' : label `1' `i'
local j = `j' + 1
}

local i 1
local j 1
while `i' == `j' & `i' <= `maxval' {
label variable resp`i' "`labval`j''"
local i = `i' + 1
local j = `j' + 1
}

/* elimination of dummy variables which only have zeros */
forvalues i = 1/`maxval' {
egen m`i'= max(resp`i')
}
forvalues i = 1/`maxval' {
if m`i' == 0 {
drop resp`i'
}
}


if `"`by'"' == "" {
tabstat resp1-resp`maxval' if `touse' `w', stat(mean count)
format(%3.1f) column(statistics) longstub
}
else if `"`by'"' ~= "" {
tabstat resp1-resp`maxval' if `touse' `w', stat(mean count)
format(%3.1f) col(stat) by(`by') long
}


end
*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

Follow-Ups:
- st: RE: Analyzing multiple response variables with multiple categories
  - From: "Nick Cox" <[email protected]>

Prev by Date: st: RE: Problem with minutes in egenmore
Next by Date: st: panel: within and between dimension/correlated effects
Previous by thread: Re: st: RE: Graphs
Next by thread: st: RE: Analyzing multiple response variables with multiple categories
Index(es):
- Date
- Thread