# st: R: Which regression model to use for zero-inflated, non-normal outcome?

 From "Carlo Lazzaro" To Subject st: R: Which regression model to use for zero-inflated, non-normal outcome? Date Sat, 3 Oct 2009 08:33:04 +0200

```<(1) The outcome is non-normal (swilk p-value 0.0000), so I can't use a
linear regression model.>

But the normality assumption in linear regression refers to residuals rather
than independent variable. If your independent variable is per patient
health care costs, for instance, there's a very negligible chance that they

<(4) Negative binomial model has a better fit, but does the high number of
zeros raise any concern?>

Observed zeros can give you some problems as far as their frequency is
higher than that expected by the probability distribution you selected.

<(5) I also tried zero inflated negative binomial regression, but all the
examples I've seen are where one of the independent variables has a high
number of zeros. Is it appropriate to use the zinb command when the
dependent variable has a high number of zeros?>

For more on this topic, please see:
J. Scott Long, Jeremy Frase. Regression Model for Categorical Dependent
Variables Using Stata. Second edition. College Station: Stata Press, 2006.

see www.stata.com, bookstore section.

HTH and Kind Regards,
Carlo
-----Messaggio originale-----
Da: owner-statalist@hsphsun2.harvard.edu
[mailto:owner-statalist@hsphsun2.harvard.edu] Per conto di Ashwin
Ananthakrishnan
Inviato: sabato 3 ottobre 2009 3.31
A: statalist@hsphsun2.harvard.edu
Oggetto: st: Which regression model to use for zero-inflated, non-normal
outcome?

Hi,

I'm trying to run a regression model to identify independent predcitors of a
specific continuous outcome (independent variable).

(1) The outcome is non-normal (swilk p-value 0.0000), so I can't use a
linear regression model.

(2) There are a number of patients where the outcome value is zero
(approximately 30% of the cohort). So I can't direct use a log linear model
because automatically patients in whom the outcome is zero have a
non-calculable log(outcome) and are dropped from the analysis. One option
would be that i have nominal value for those with zero, i.e. add 0.5 to all
patients so that the outcome is not zero.

(3) Even if the outcome is a count variable (incidence), the variance is
much >>> the mean, and the Poisson goodness of fit has a p of 0.000.

(4) Negative binomial model has a better fit, but does the high number of
zeros raise any concern?

(5) I also tried zero inflated negative binomial regression, but all the
examples I've seen are where one of the independent variables has a high
number of zeros. Is it appropriate to use the zinb command when the
dependent variable has a high number of zeros?

Thanks,

Ashwin

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
```