[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

From |
"Verkuilen, Jay" <JVerkuilen@gc.cuny.edu> |

To |
<statalist@hsphsun2.harvard.edu> |

Subject |
st: RE: dealing with 0 in cells in 2x2 tables |

Date |
Sat, 28 Jun 2008 16:59:22 -0400 |

Caryl Beynon wrote: >>I am building a logistic regression model but first I wish to carry out bivarite analyses measuring the effect of each exposure on the outcome. In my 2 by 2 table I have 0 in one cell so the odds ratio is either infinity of 0. However, the chi square test shows that the exposure (homelessness) is significantly associated with the outcome, so I cannot simply ignore this variable in my bivarite and multivariate analyses. How do people deal with this situation?< There are a few ways to deal with the problem. Unfortunately most of them aren't very nice. (1) One common trick is to use a "flattening" constant. This involves adding some extra cases to the dataset in a way that tends to bias everything a little towards non-significance. For instance, if you added four observations, one to each cell, of the 2x2 table, you would have non-zero observations in all cells and therefore a finite odds ratio. The problem, of course, is how much to use. This procedure can be justified on Bayesian grounds so it's not completely arbitrary, but you also have covariates, which complicates matters. It would be OK with a prior. I believe Andrew Gelman wrote an R program that does this for logistic regression fairly straightforwardly but I don't know the name of it. (It would be a nice thing to port to Stata.) (2) -exlogistic- can cope with this problem. I just put your data in and tried it (curiosity killed the cat): exlogistic homeless case [iterations snipped] Exact logistic regression Number of obs = 164 Model score = 2.064807 Pr >= score = 0.2039 ------------------------------------------------------------------------ --- homeless | Odds Ratio Suff. 2*Pr(Suff.) [95% Conf. Interval] -------------+---------------------------------------------------------- --- case | 4.043903 22 0.2548 .5882011 175.3772 ------------------------------------------------------------------------ --- You're going to have to read up on this to know what the heck it means. However, it seems that the exact confidence interval spans 1 and this is non-significant. (Assuming I believed in such things.) Just for comparison, I added one to each cell in your table and ran: logit homeless case [fweight = newfreq], or [iterations snipped] Logistic regression Number of obs = 167 LR chi2(1) = 3.03 Prob > chi2 = 0.0816 Log likelihood = -67.228331 Pseudo R2 = 0.0221 ------------------------------------------------------------------------ ------ homeless | Odds Ratio Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------- ------ case | 4.408333 4.613569 1.42 0.156 .5668186 34.28505 ------------------------------------------------------------------------ ------ Seems to be about the same story either way. * * For searches and help try: * http://www.stata.com/support/faqs/res/findit.html * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

**References**:**st: dealing with 0 in cells in 2x2 tables***From:*"Beynon, Caryl" <C.M.Beynon@ljmu.ac.uk>

- Prev by Date:
**st: RE: gllamm or else?** - Next by Date:
**st: capture blocks** - Previous by thread:
**st: dealing with 0 in cells in 2x2 tables** - Next by thread:
**st: instruments in ivprobit** - Index(es):

© Copyright 1996–2016 StataCorp LP | Terms of use | Privacy | Contact us | What's new | Site index |