Stata The Stata listserver
[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

Re: st: k-sample tests for differences in proportions

From   May Boggess <>
Subject   Re: st: k-sample tests for differences in proportions
Date   05 Nov 2003 11:55:04 -0600

On Wednesday, Karen Robson wrote:

> What I am trying to calculate is if the mean of a dummy 
> variable is different across the categories of a separate 
> categorical variable. 
> If my DV was continuous, I could do an anova and with 
> post-hoc estimations figure out where the significant 
> differences between categories were.

Karen wants to do two things:
1) test if the proportions are the same or different, and then
2) if (1) shows significant evidence of a difference in proportions
between the groups, she will want to test each pair of groups to see
where the difference lies.

As Jingky responded, a common way to approach (1), is to do a (large
sample) Chi-test. Here I am using the dataset generated by Joseph
Coveney's in his response:

 set obs 300
 set seed 20031105
 generate byte religion = mod(_n, 5) + 1
 generate byte degreed = uniform() > 0.6
 replace deg= uniform() > 0.3 if rel==5
 table  rel, c(mean deg)
 tabulate religion degreed, chi2

Stata can do an exact test which is appropriate if the sample is small:

 tabulate religion degreed, exact

This exact test is the generalization of Fisher's exact test to an r X c
table. It takes quite some time to calculate for even a modestly sized

Another common way to approach (1), is with logistic regression. With
this the null hypothesis is that the odds ratio = 1, which is the same a
saying that the odds of success is the same across all groups. When the
probability of success (or failure) is small, the odds ratio is amost
equal to the relative risk, and so in that case, testing whether or not
the odds ratio is 1 is the same as testing whether or not the
proportions are equal. 

I think one reason logistic regression (or any other general linear
model for a binary response) is often used, is that you can easily
incorporate additional explanitory varibles into your model, that is,
control for other factors beyond the one in which you are particularly
interested. Let's get to the code:

 logistic degreed religion, nolog

I deliberately cooked up a dataset to have group 5 to be different from
the others, so our odds ratio should come out being significantly
different from 1.

This answers question (1). But what about question (2)? How can we tell
that it is group 5 that is the problem? We can still use logistic
regression, using dummy variables (as Joseph did in his response):

 xi: logistic degreed i.religion, nolog

Stata tells you that one group has been ommitted (group 1), so the odds
ratios are the comparison of each group to group 1. So now we can tell
that group 5 is the problem case.

But, does that tell us the whole story? Go back and look at the table
of proportions:

 table  rel, c(mean deg)

It looks to me like the biggest difference is between group 2 and group
5. How can I test for that? Well, I need to control which dummy is
ommitted so I can control the pairs of groups I am comparing. I do this
as follows:

 qui tabulate rel, gen(cat)
 logistic deg  cat2 cat3 cat4 cat5
 logistic deg cat1   cat3 cat4 cat5
 logistic deg cat1 cat2   cat4 cat5
 logistic deg cat1 cat2 cat3   cat5
 logistic deg cat1 cat2 cat3 cat4

This surely tells me everything I need to know? Looking at the last
output I see that group 2 is the most different from group 5 (odds ratio
furthest away from 1).

BUT (an even bigger but than the previous but), I have to watch out that
I don't make a mistake now. 

When I look at all those comparisons between groups that I just did,
what is the over all error rate? When I make one comparison the error
rate is less than alpha which we usually set at  0.05. When you want to
make many comparisons, you have to make some sort of adjustment to what
you will call "significant" if you want to control how many mistakes you
are prepared to make in all those decisions. 

A common (approximate) adjustment to the error rate is called
Bonferroni, which means just divide alpha by the number of comparisons
and compare each p-value to that. In our case thse means divided by ten
(because we have 5 groups 5*4/2=10 pairs of groups to compare). So, when
I am looking over all those p-value produced by all those logistic
regressions above, I need to be comparing them to 0.005, not 0.05, to
look for significance. The moral here is to be very careful when making
multiple comparisons and think about the overall error rate when
considering p-values.

What about other methods besides regression? Glad you asked.

What about if I tried multiple -prtest-, with a Bonferroni correction?
Will I arrive at the same decisions as I would have using -logistic-?

display as text _n " ------------  multiple prtests, with Bonferroni
correction" _n
local group = "religion"
local numgrp = 5
local bin = "degreed"
local alpha=0.05

local numcomps= `numgrp'*(`numgrp'-1)/2
display "p-value compared to alpha/(#groups*(#groups-1)/2) =

forvalues i=1/`numgrp'{
local i1=`i'+1
  forvalues j=`i1'/`numgrp'{
  qui  prtest `bin' if `group'==`i' |`group'==`j' , by(`group')
  local ts=r(z)
  local p=2*(1 - norm(abs(r(z))))
  if `p' < 10^(-6) local p=0
  if `p' < `alpha'/`numcomps' {
  local reject = "reject" 
  local reject= "fail to reject"

  display as result "prtest " `i' ", " `j' /*
  */ "  test stat = " %8.4f  `ts' /*
  */ _col(37) "   p value  = " %8.4f `p' /*
  */ _col(63) "     `reject'"
display _newline

We get the same decisions. We do have to remember with this approach,
that -prtest- is a large sample test. We can do this test of each pair
approach using any of the tests used for comparing two groups. In other
words, you could do the above using -tabulate, chi2- (also for large
samples) or -tabulate, exact- (for small samples).

But what about the Bonferroni correction? How good is that? Well, I only
know of one other way to handle this situation, which avoids Bonferroni.
It's called the Marascuillo Procedure. I have not found a good reference
for it in a book, but it is described at this website:

Let's see how it stacks up against what we have already done:

display as text _n  "   ------------  Marascuillo Procedure " _n 

forvalues i=1/`numgrp'{
quietly summ `bin' if `group' ==`i'
local p`i'=r(mean)
local N`i'=r(N)

forvalues i=1/`numgrp'{
local i1=`i'+1
  forvalues j=`i1'/`numgrp'{
  local se=sqrt(`p`i''*(1-`p`i'')/`N`i'' + `p`j''*(1-`p`j'')/`N`j'')
  local cv=sqrt(invchi2tail(`numgrp',`alpha'))*`se'
  local ts=abs(`p`i''-`p`j'') 
  if `ts' > `cv' {
  local reject = "reject" 
  local reject= "fail to reject"

  display as result "test " `i' ", " `j' /*
  */ "   crit val = "  %8.4f  `cv' /*
  */  _col(37) "  test stat = " %8.4f  `ts' /*
  */  _col(63) "     `reject'"
display _newline

Same decisions as before. 


*   For searches and help try:

© Copyright 1996–2022 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index