Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: Endogeneity in zero inflated Poisson model

From   Austin Nichols <>
To   "" <>
Subject   Re: st: Endogeneity in zero inflated Poisson model
Date   Thu, 13 Feb 2014 08:07:40 -0500

Goedele Van den Broeck <>:

You should use -ivpoisson- as you have no evidence of zero inflation.
Women have very different target lifetime fertility and period
fertility, and errors are in both directions, but with a zero lower
bound, many observations will pile up at zero, and many at one, even
when target fertility (by some age) is two, and a much smaller number
of cases with zero target fertility will be observed at one or two. No
doubt there are many women who have very low conditional mean
fertility, because they don't want children, are too young to have
much exposure to risk of pregnancy, are sexually abstinent, would
choose termination conditional on pregnancy, or what have you. Such
things should be captured by the predictors you include in your model.
But true zero inflation means true infertility, which is probably
under 5 percent, and adding that complication will improve your
estimates much less than getting the right predictors in the model.

On Thu, Feb 13, 2014 at 5:19 AM, Goedele Van den Broeck
<> wrote:
> Dear all,
> I'm struggling with endogeneity in a zero inflated Poisson model, and it's hard to find decent solutions for it.
> My model is: Y = B0 + B1W + B2X + e
> Where Y is the number of children per woman, W is the employment status of a woman, which is likely to be endogenous, and X are exogenous factors.
> The dependent variable, Y, is a count variable with an excess of zero observations (more than 50% of my 997 observations have zero children).
> Therefore, I choose to use a zero inflated Poisson model (zip in STATA).
> Handling endogeneity in a zip model doesn't seem to be straightforward, because of the nonlinear nature of the model and the two different processes it simultaneously generates.
> According to Terza et al. (2008), the usual approach of two-stage predictor substitution (2SPS) does not yield consistent estimates in nonlinear models.
> Instead, they suggest two stage residual inclusion (2SRI), also known as the control function approach (Wooldridge, 2010, p 126-129).
> The 2SRI estimation is specified as follows:
> first stage:   W = a0 + a1X + a2Z + u  (Z is an IV for W)
> second stage: Y = B0 + B1W + B2X + B3u_hat + e
> Does anybody know whether you could apply 2SRI approach in a zip model?
> And if so, how do you exactly implement this in STATA?
> I know that I could use the option ivpoisson cfunction in STATA, but then I don't take the zero excess observations into account, which results in biased estimates.
> Thanks for your help!
> Best regards,
> Goedele Van den Broeck
> References:
> Terza, Basu and Rathouz, 2008. Two-stage residual inclusion estimation: Addressing endogeneity in health econometric modelling. Journal of Health Economics, 27: 531-543.
> Wooldridge, 2010, Econometric Analysis of Cross Section and Panel Data (Second Edition).
*   For searches and help try:

© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index