[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: suggested references about the variables to include in zero-inflated portion of zinb?

From   Steven Samuels <>
Subject   Re: st: suggested references about the variables to include in zero-inflated portion of zinb?
Date   Sun, 26 Oct 2008 11:05:43 -0400

Tim--the Subject of your last post was completely uninformative (st: Re: statalist-digest V4 #3224). If you receive the Digest, do not use the "Reply" button to respond.

I have a few thoughts:

1. The reviewer's original opinion is not correct. If your target parameter is the mean score, then OLS may give a consistent estimate, even if the data are skew and non-normal. The proviso is that you have a good prediction model for the mean. However with OLS, standard errors will be incorrect. The fix is easy: -reg- with a - robust- option will give standard errors that are model-free.

2. Did you compare observed and expected values by eye and with a chi square test? If the -zinb- fit is not good, there is little justification for using it.

3. If, by chance, -zinb- happens to give a good fit, standard errors based on the ZINB model will be wrong. You should use the -robust- option or a bootstrap, as Carlo suggested.

4. Published analyses of CESD with the zero-inflated negative binomial are not, in themselves, justification for using -zinb- in your problem. Did the published distributions fit the data? I've done analyses with full and reduced versions CESD. In one data set and in national data the distribution was quite symmetric. In another data set the distribution was bimodal. (I think this was an interviewer problem) In neither case was there a lump at the minimum (or maximum) value. In fact, the extreme responses were the rarest ones.

5. If you do see lumps at the extremes, considered that they are dishonest. Why? With count data, a separate model for responding at all is plausible. With questionnaire scales, a minimum or maximum score is the result of a respondent checking the same value for every item. (I use the world "lumps", but in the statistical literature, isolated higher density regions are usually called "bumps".)

6. If you want to fit the distribution of scores, as opposed to predicting means, the beta distribution may provide a good approximation. Divide the scores by the maximum possible, so that the results are proportions. Then download -betafit- from SSC. You will need to add a small constant to the zeros and subtract it from the ones before you do your regressions.


I am using zinb to estimate level of psychological distress (scores range from 0-24) using various demographic variables and measures of use of the Internet. I've used -countfit- to compare various count models and the results support zinb as the best fitting model.

I am uncertain, however, about how to justify the variables that I include in the zero-inflated part of the model. I've read journal articles that have used zinb, read the book by Freese and Long, and searched the Internet and Statalist but I have not been able to find any detailed recommendations or procedures. Can anyone suggest any other sources (books or journals) that provide an explanation or a good example of this process?

Ideally I would like to find a good source that I can cite in the paper -- but I appreciate any suggestions about this you might have.

Thanks for you help,

Timothy M. Hale, MA
Graduate Assistant
University of Alabama at Birmingham
Department of Sociology

*   For searches and help try:

*   For searches and help try:

© Copyright 1996–2023 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index