Statalist


[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

st: RE: dealing with 0 in cells in 2x2 tables


From   "Verkuilen, Jay" <JVerkuilen@gc.cuny.edu>
To   <statalist@hsphsun2.harvard.edu>
Subject   st: RE: dealing with 0 in cells in 2x2 tables
Date   Sat, 28 Jun 2008 16:59:22 -0400

Caryl Beynon wrote:

>>I am building a logistic regression model but first I wish to carry
out
bivarite analyses measuring the effect of each exposure on the outcome.
In my 2 by 2 table I have 0 in one cell so the odds ratio is either
infinity of 0. However, the chi square test shows that the exposure
(homelessness) is significantly associated with the outcome, so I cannot
simply ignore this variable in my bivarite and multivariate analyses.
How do people deal with this situation?<

There are a few ways to deal with the problem. Unfortunately most of
them aren't very nice.  

(1) One common trick is to use a "flattening" constant. This involves
adding some extra cases to the dataset in a way that tends to bias
everything a little towards non-significance. For instance, if you added
four observations, one to each cell, of the 2x2 table, you would have
non-zero observations in all cells and therefore a finite odds ratio.
The problem, of course, is how much to use. This procedure can be
justified on Bayesian grounds so it's not completely arbitrary, but you
also have covariates, which complicates matters. It would be OK with a
prior. I believe Andrew Gelman wrote an R program that does this for
logistic regression fairly straightforwardly but I don't know the name
of it. (It would be a nice thing to port to Stata.) 

(2) -exlogistic- can cope with this problem. I just put your data in and
tried it (curiosity killed the cat):

exlogistic homeless case

[iterations snipped]

Exact logistic regression                        Number of obs =
164
                                                 Model score   =
2.064807
                                                 Pr >= score   =
0.2039
------------------------------------------------------------------------
---
    homeless | Odds Ratio       Suff.  2*Pr(Suff.)     [95% Conf.
Interval]
-------------+----------------------------------------------------------
---
        case |   4.043903          22      0.2548      .5882011
175.3772
------------------------------------------------------------------------
---

You're going to have to read up on this to know what the heck it means.
However, it seems that the exact confidence interval spans 1 and this is
non-significant. (Assuming I believed in such things.) 

Just for comparison, I added one to each cell in your table and ran:


logit homeless case [fweight = newfreq], or

[iterations snipped]

Logistic regression                               Number of obs   =
167
                                                  LR chi2(1)      =
3.03
                                                  Prob > chi2     =
0.0816
Log likelihood = -67.228331                       Pseudo R2       =
0.0221

------------------------------------------------------------------------
------
    homeless | Odds Ratio   Std. Err.      z    P>|z|     [95% Conf.
Interval]
-------------+----------------------------------------------------------
------
        case |   4.408333   4.613569     1.42   0.156     .5668186
34.28505
------------------------------------------------------------------------
------


Seems to be about the same story either way. 

*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index