Statalist


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

st: R: Which regression model to use for zero-inflated, non-normal outcome?


From   "Carlo Lazzaro" <carlo.lazzaro@tin.it>
To   <statalist@hsphsun2.harvard.edu>
Subject   st: R: Which regression model to use for zero-inflated, non-normal outcome?
Date   Sat, 3 Oct 2009 08:33:04 +0200

<(1) The outcome is non-normal (swilk p-value 0.0000), so I can't use a
linear regression model.>

But the normality assumption in linear regression refers to residuals rather
than independent variable. If your independent variable is per patient
health care costs, for instance, there's a very negligible chance that they
follow a normal distribution.

<(4) Negative binomial model has a better fit, but does the high number of
zeros raise any concern?>

Observed zeros can give you some problems as far as their frequency is
higher than that expected by the probability distribution you selected.

<(5) I also tried zero inflated negative binomial regression, but all the
examples I've seen are where one of the independent variables has a high
number of zeros. Is it appropriate to use the zinb command when the
dependent variable has a high number of zeros?>

For more on this topic, please see:
J. Scott Long, Jeremy Frase. Regression Model for Categorical Dependent
Variables Using Stata. Second edition. College Station: Stata Press, 2006.

I do not know whether a more recent version is currently available (please,
see www.stata.com, bookstore section.

HTH and Kind Regards,
Carlo
-----Messaggio originale-----
Da: owner-statalist@hsphsun2.harvard.edu
[mailto:owner-statalist@hsphsun2.harvard.edu] Per conto di Ashwin
Ananthakrishnan
Inviato: sabato 3 ottobre 2009 3.31
A: statalist@hsphsun2.harvard.edu
Oggetto: st: Which regression model to use for zero-inflated, non-normal
outcome?

Hi, 

I'm trying to run a regression model to identify independent predcitors of a
specific continuous outcome (independent variable). 

(1) The outcome is non-normal (swilk p-value 0.0000), so I can't use a
linear regression model. 

(2) There are a number of patients where the outcome value is zero
(approximately 30% of the cohort). So I can't direct use a log linear model
because automatically patients in whom the outcome is zero have a
non-calculable log(outcome) and are dropped from the analysis. One option
would be that i have nominal value for those with zero, i.e. add 0.5 to all
patients so that the outcome is not zero. 

(3) Even if the outcome is a count variable (incidence), the variance is
much >>> the mean, and the Poisson goodness of fit has a p of 0.000.

(4) Negative binomial model has a better fit, but does the high number of
zeros raise any concern?

(5) I also tried zero inflated negative binomial regression, but all the
examples I've seen are where one of the independent variables has a high
number of zeros. Is it appropriate to use the zinb command when the
dependent variable has a high number of zeros?

Thanks, 

Ashwin





      
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index