Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

re: RE: st: Zero Inflated Poisson Regression

From   "Ariel Linden, DrPH" <>
To   <>
Subject   re: RE: st: Zero Inflated Poisson Regression
Date   Wed, 8 Aug 2012 12:42:52 -0400

Hi Scott,

I have been mulling over your posting, trying to think of a similar scenario
in my discipline (health services research). In fact, this is similar to
certain medical conditions that are age related. For example, asthma impacts
young children and then goes ?dormant? for a few years and then reappears in
the early 20?s to 30?s (after that it goes dormant again and reappears as
chronic obstructive pulmonary disease later on in the late 40?s to early

So imagine that we?d want to look at hospitalizations for asthma as the
outcome. This is likely a Poisson-like distribution, and we?d need to
account for the age issue described above.

Exact matching on age (or age category) would seem to be a reasonable
approach here, since we?d expect children undergoing an intervention to have
fewer hospitalizations than children not undergoing the intervention (or
perhaps lower probability of hospitalization). Those individuals in the
middle age range where asthma is ?dormant? will not likely show any
difference over controls in hospitalizations, but that may be a function of
sample size/power. You could also consider stratification here, which may be
a better approach with a bimodal or multi-modal distribution.

You also correctly noted that you could use either splines or fractional
polynomials. In the case of age, we?d expect this ?transformation? to
account for the distributional ?hump? for children and then again in the
twenties. I imagine that splines may fit better than fp.

As for omitted variables, you?d have to answer that question based on your
knowledge of the data and content expertise. Is there a reason to believe
you?re missing important variables? Would you assume the results are biased?
I would suggest that you run a sensitivity analysis after your analysis to
determine the likelihood of unknown confounding biasing your results?

I hope this helps


Date: Tue, 7 Aug 2012 13:04:18 -0400
From: "Scott Holupka" <>
Subject: RE: st: Zero Inflated Poisson Regression

Thanks for the suggestions.  We've tried in the past to find an appropriate
IV, but so far haven't found anything that works.  At least with propensity
we can try to control for any observed differences.  


- -----Original Message-----
[] On Behalf Of Cameron McIntosh
Sent: Monday, August 06, 2012 3:10 PM
Subject: RE: st: Zero Inflated Poisson Regression


Good question. Generally, I don't know if there is very much out there on
how to fit ZIPs, or count/rate variable regression models in general, with
non-linear relations (e.g., quadratic as you seem to suggest).  I don't know
what Stata has to offer in this regard (as I'm not a "Stata guy"), but I
might suggest a neural network approach, perhaps using MATLAB:

Nader, F., Hong, G., Kazem, M., Ali, S.S., Keramat, N., & Reza, E.M. (2009).
Nonlinear Poisson regression using neural networks: a simulation study.
Neural Computing & Applications, 18(8), 939-943.

As for your endogeneity problem, MATLAB also does propensity score matching,
and you may also want to consider using instrumental variables, if you can
find some good ones in your data set.

Caliendo, M., & Kopeinig, S. (2008). Some practical guidance for the
implementation of propensity score matching. Journal of Economic Surveys,
22, 31-72.

Stuart, E. A. (2010). Matching methods for causal inference: A review and a
look forward. Statistical Science, 25, 1-21.

Bollen, K.A. (2012). Instrumental Variables in Sociology and the Social
Sciences. Annual Review of Sociology, 38, 37-72.

Perhaps some more experienced Stata programmers could provide you with a
Stata solution, however. Anyway, hope this helps.


- ----------------------------------------
> From:
> To:
> Subject: st: Zero Inflated Poisson Regression
> Date: Mon, Aug 012 3::8::2 -400<
> This is mainly a question about running a zero-inflated poisson regression
> using zip (Stata 0..)), but it's also a more general question of whether
> Statalisters think I'm using the procedures in an appropriate way.
> My analysis is examining several expenditure categories. Typical of
> expenditure data, the outcome variables are all skewed. Also typical is
> that several outcomes have a large percentage (0%% to 0%%) of cases
> reporting zero. I am therefore considering using zero-inflated poisson
> models - zip - to examine these outcomes.
> Prior research also suggests that the relationship between our primary
> independent variable - call it H - and expenditures will not be linear. In
> particular, we expect spending may be lower at both high and low values of
> H. I have previously used polynominal models to examine this relationship,
> but I'm not sure if polynomials can be used with negative poissson models.
> I am therefore also considering using a piecewise regression approach with
> ZIP.
> Finally, I'm concerned about omitted variable bias since I don't have a
> randomized sample. Again, in previous work I've used propensity score
> methods to account for differences in observed characteristics.
> I know how to implement each of these methods in Stata, but I'm wondering
> it's appropriate to use all three methods at once. My current plan is to
> run propensity analyses to identify similar groups based on observed
> characteristics, then use those groups as covariates in a zero-inflated
> poisson model that also include polynomial terms of H (e.g. H and
> H-squared), or computing piecewise dummy variables of H.
> Any thoughts on whether this approach seems appropriate, particularly
> whether ZIP can handle both the propensity covariates and polynomial
> would be appreciated.
> Thanks,
> Scott

*   For searches and help try:

© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index