Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

# Re: st: Zero Inflated Negative Binomial model

 From David Hoaglin To statalist@hsphsun2.harvard.edu Subject Re: st: Zero Inflated Negative Binomial model Date Sat, 21 Jan 2012 08:06:53 -0500

Eugene,

You are correct that using a ZINB model would be problematic.  The NB
distribution applies to counted data (i.e., it is possible for any
nonnegative count to occur in the outcome variable).  When you have
only categories, that requirement is not satisfied, no matter what
value you choose to represent each category.

I don't know whether the ordinal logit model has a zero-inflated
version (I have not searched).  Here "zero-inflated" would mean that
the first category is inflated, since numerical values associated with
the ordered categories are only labels.  If someone has worked out
such a model, you would still need to determine whether, in your data,
the assumption of proportional odds is reasonable.  You could try an
ordinal logistic regression model with your data as they stand, and
see what happens.

As an exploratory step, you could fit a binary logit model to "0
times" versus "1 or more times"; that would address the question of
crossing the threshold into self-injurious behavior.  You could then
work with only the nonzero categories and dichotomize the outcome
variable at each of the category boundaries (or some of them) and fit
a binary logit model to each dichotomized outcome.  Comparison of the
coefficients on the predictor variables among those models would give
you an indication of whether the proportional odds model is
reasonable.

You didn't describe the sorts of predictor variables that you have.
Other analytic approaches may be possible.

David Hoaglin

On Fri, Jan 20, 2012 at 8:02 AM, Eugene Walls <Eugene.Walls@du.edu> wrote:
> I am working with a dataset that contains counts of the number of times that youth in the sample engage in self-harming behaviors (such as cutting). My co-authors and I are interested in using the zero-inflated negative binomial models because (a) we have a sample that has about 74% zeroes and (b) because we are conceptualizing two processes occurring - one that predicts the likelihood of crossing the threshold into self-injurious behavior and one that predicts the number of times of engaging in the behavior. The Vuong test seems to indicate that the ZINB model is a better fit for the data than the NBReg model.
>
> Our question concerns if it is appropriate to use the ZINB because the response set of the variable capturing the number of times of engaging in SIB is not a straight count, but rather a "0 times" "1 time" "2-3 times" "4-5 times" "6-10 times" "11-20 times" "21-49 times" "50 or more times". We have recoded the variable into 0, 1, 2, 4, 6, 11, 21, 50 using the minimum in the category.but if we do that is using the ZINB model problematic?
>
> Thanks
> Eugene

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/