Statalist The Stata Listserver


[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

Re: st: Proportion tests for non-binary variables


From   jpitblado@stata.com (Jeff Pitblado, StataCorp LP)
To   statalist@hsphsun2.harvard.edu
Subject   Re: st: Proportion tests for non-binary variables
Date   Tue, 11 Apr 2006 12:00:42 -0500

Herve STOLOWY <stolowy@hec.fr> has two categorical variables and wants to
compare the proportions of each category between them:

> I would like to test the equality of proportions of two variables which are
> not binary. Each variable can have four values (0, 1, 2 and 3). (To be more
> precise, the original variable is the same but are applied to two different
> populations, the total sample and a restricted sample. I created two
> different variables. With -tabulate-, I get easily the frequencies of both
> variables).
>
> I can't use -prtest- and -ztest- because these two commands require, to my
> knowledge, binary variables.
>
> My comparison should work on unpaired data.
>
> I searched in Stata on "proportions" but did not find any command for that
> purpose. I missed maybe something. Would you have an idea?

I'll assume you have two variables, say -x1- and -x2-.  You could reshape your
data from wide to long and then use -tabulate- to get an association test
between the categories of your original variables.  Here is a simulated data
example.

First I'll generate some data:

	. drop _all
	. set seed 1234
	. set obs 25
	. gen x1 = int(3*uniform()) + 1
	. gen x2 = int(3*uniform()) + 1

I'll use the -mean- command to do a quick summary of these variables that I
can check against after I reshape the data, the means and standard errors
should be sufficient to tell me if I did the -reshape- correctly.

	. mean x1 x2

Now I'll reshape the data,

	. gen i = _n
	. reshape long x, i(i) j(id)

Here the -i- variable identifies the original observations, and I used the
-j()- option to get -reshape- to put the original variable id into the new
-id- variable.

I can use -mean- with the -over()- option to varify that the -id- variable
identifies reshaped categorical variables correctly.  The means and standard
errors should match exactly to those in the -mean- results above.

	. mean x, over(id)

Now that the data has been reshaped, I can use -tabulate- to get a chi-square
test of association:

	. tabulate x id, chi2

See -[R] tabulate twoway- for other measures/tests of association.

Cheers,

--Jeff
jpitblado@stata.com
*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index