[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

From |
"Carlo Lazzaro" <carlo.lazzaro@tin.it> |

To |
<statalist@hsphsun2.harvard.edu> |

Subject |
st: R: Which regression model to use for zero-inflated, non-normal outcome? |

Date |
Sat, 3 Oct 2009 08:33:04 +0200 |

<(1) The outcome is non-normal (swilk p-value 0.0000), so I can't use a linear regression model.> But the normality assumption in linear regression refers to residuals rather than independent variable. If your independent variable is per patient health care costs, for instance, there's a very negligible chance that they follow a normal distribution. <(4) Negative binomial model has a better fit, but does the high number of zeros raise any concern?> Observed zeros can give you some problems as far as their frequency is higher than that expected by the probability distribution you selected. <(5) I also tried zero inflated negative binomial regression, but all the examples I've seen are where one of the independent variables has a high number of zeros. Is it appropriate to use the zinb command when the dependent variable has a high number of zeros?> For more on this topic, please see: J. Scott Long, Jeremy Frase. Regression Model for Categorical Dependent Variables Using Stata. Second edition. College Station: Stata Press, 2006. I do not know whether a more recent version is currently available (please, see www.stata.com, bookstore section. HTH and Kind Regards, Carlo -----Messaggio originale----- Da: owner-statalist@hsphsun2.harvard.edu [mailto:owner-statalist@hsphsun2.harvard.edu] Per conto di Ashwin Ananthakrishnan Inviato: sabato 3 ottobre 2009 3.31 A: statalist@hsphsun2.harvard.edu Oggetto: st: Which regression model to use for zero-inflated, non-normal outcome? Hi, I'm trying to run a regression model to identify independent predcitors of a specific continuous outcome (independent variable). (1) The outcome is non-normal (swilk p-value 0.0000), so I can't use a linear regression model. (2) There are a number of patients where the outcome value is zero (approximately 30% of the cohort). So I can't direct use a log linear model because automatically patients in whom the outcome is zero have a non-calculable log(outcome) and are dropped from the analysis. One option would be that i have nominal value for those with zero, i.e. add 0.5 to all patients so that the outcome is not zero. (3) Even if the outcome is a count variable (incidence), the variance is much >>> the mean, and the Poisson goodness of fit has a p of 0.000. (4) Negative binomial model has a better fit, but does the high number of zeros raise any concern? (5) I also tried zero inflated negative binomial regression, but all the examples I've seen are where one of the independent variables has a high number of zeros. Is it appropriate to use the zinb command when the dependent variable has a high number of zeros? Thanks, Ashwin * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/ * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

**References**:**st: Which regression model to use for zero-inflated, non-normal outcome?***From:*Ashwin Ananthakrishnan <ashwinna@yahoo.com>

- Prev by Date:
**st: R: test overdispersion xtpoisson** - Next by Date:
**st: re: Comparing variable values with a predefined list in other dataset** - Previous by thread:
**st: Which regression model to use for zero-inflated, non-normal outcome?** - Next by thread:
**st: re: Comparing variable values with a predefined list in other dataset** - Index(es):

© Copyright 1996–2017 StataCorp LLC | Terms of use | Privacy | Contact us | What's new | Site index |