Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Subject: st: simulating random numbers from zero inflated negative binomial estimates

Subject   Subject: st: simulating random numbers from zero inflated negative binomial estimates
Date   Tue, 15 Nov 2011 13:29:30 -0500 (EST)

Back in June I posted code on the List for creating synthetic zero-inflated Poisson and negative binomal models. They were based on the same logic as the synthetic models i had developed for other models such as synthetic hurdle models, synthetic finite mixture models, ordered and unordered logit and probit categorical response models (proportional odds; multinomial) and other discrete response regression models that are in my "Logistic Regression Models" [LRM] and "Negative Binomial Regression" (2nd edition) [NBR2] books. Many of these were discussed in my Stata Journal article on the subject in early 2010 (SJ, Vol 10, Nu 1).

James Hardin and I have added a new chapter on creating synthetic data including correlated data in the third edition of Stata Press book, "Generalized Linear Models and Extensions" [GLME3], which is due to be out next month. In addition, for many -- perhaps the majority -- of the models discussed in GLME3 we walk through the setting up of synthetic data and synthetic models for the model being discussed For example, in the section on zero-inflated Poisson and negative binomial models we discuss the logic creating synthetic zero-inflated Poisson data. Other considerably more complicated syntheic models are developed and explained for models as well such as generalized negative binomial (NB-P) which has a third parameter, and a three component finite mixture model. We also developed Stata code for estimating models such as Poisson Inverse Gaussian (PIG), zero-inflated PIG, generalized Poisson, zero-inflated generalized Poisson, NB-P, and a number of other models that can prove to be useful in research. These models are discussed in my NBR2 book as well, but Stata code for models like PIG, ZIPIG, ZIGP, and NB-P was not yet developed. In my NBR2 and LRM books, R, Limdep, and SAS are used for models which had no Stata support -- official or unofficial. . for the examples in the book.

My point in telling you this is to let you know that there is a lot of material already available on developing synthetic models like the ZINB you were interested in. Much is already published- as as ZINB. Aside from my stuff and what Hardin and I have developed .. Cameron and Trivedi's Stata Press book on "Microeconometrics Using Stata" has quite a bit of excellent code for developing synthetic data. The code below for creating user specified ZINB data and models has the following default values, You can easily amend the number of predictors you want for each component as well as the values for the coefficients. The default value of alpha, the negative binomial heterogeneity parameter is .5. The default values are given in the comment lines at the beginning of the code. I ran the do file once with the defaults and no seed value, so another run witl produce slightly different results. You can see that the coefficient values of the resultant synthetic model are quite close to what we specified.

I hope this helps a bit.   Joe Hilbe

* Synthetic zero-inflated negative binomial with logit as binary component
* Joseph Hilbe  15Aug2005; 10Oct2009; 5Jun2011
* LOGIT: x1= .9, x2= .1, _c= .2
* NB2  : x1=.75, n2=-1.25, _c=2, alpha=.5
set obs 50000
* set seed 1000
gen x1 = invnorm(runiform())
gen x2 = invnorm(runiform())
gen xb = 2 + 0.75*x1 - 1.25*x2
gen a = .5
gen ia = 1/a
gen exb = exp(xb)
gen xg = rgamma(ia, a)
gen xbg = exb * xg
gen nby = rpoisson(xbg)
gen pi =1/(1+exp(-(.9*x1 + .1*x2+.2)))
gen bernoulli = runiform()>pi
gen zy = bernoulli*nby
rename zy y
zinb y x1 x2, inf(x1 x2) nolog

Zero-inflated negative binomial regression Number of obs = 50000 Nonzero obs = 19433 Zero obs = 30567

Inflation model = logit LR chi2(2) = 25898.50 Log likelihood = -89533.47 Prob > chi2 = 0.0000

y | Coef. Std. Err. z P>|z| [95% Conf. Interval]
y            |
x1 | .7506944 .0066242 113.33 0.000 .7377112 .7636775 x2 | -1.250086 .0068687 -182.00 0.000 -1.263549 -1.236624 _cons | 1.99581 .006915 288.62 0.000 1.982257 2.009364
inflate      |
x1 | .9047498 .0141011 64.16 0.000 .8771121 .9323875 x2 | .1003384 .0125745 7.98 0.000 .0756928 .1249839 _cons | .199909 .0123115 16.24 0.000 .175779 .2240391
/lnalpha | -.6870374 .0153996 -44.61 0.000 -.7172202 -.6568547
alpha | .5030642 .007747 .4881072 .5184796

Date: Mon, 14 Nov 2011 16:04:17 +0000
From: Daniel Hill-McManus <>
Subject: st: simulating random numbers from zero inflated negative binomial estimates

I came across this code recently that Paul wrote in response to a query
(see below).

I've also found it useful, thank you. But working through it I cannot
understand why the linear predictor p2 is not exponentiated before being
used in rgamma(1/alph, alph*p2).

I'd be grateful if someone could point out to me why this is.


On 3/06/2011 2:56 a.m., E. Paul Wileyto wrote:

    I'm not sure whether anyone has answered this yet.

    First, read the help on zinb post-estimation commands. There are
many flavors of "predict" listed there. You will need three of them
before you start generating random numbers. The first one you will need is:

    predict p1, pr

    That will generate a new variable, p1, which will be the predicted
probability of an inflated zero. All the work is done for you.

    The second predicted quantity you will need is:

    predict lp, xb

    That will generate the linear combination of predictor variables
weighted by coefficients for the negative binomial part of the model.
Finally, you will need:

    predict alpha, xb eq(#3)

    which will generate a variable containing the overdispersion
parameter for the negative binomial. With those three bits, you can get on to simulating.

    Here's my script:

    zinb cignums drug  week, inf(drug  week)
    predict p1 , pr
    predict p2 , xb
    predict lnalpha , xb eq(#3)
    gen alph=exp(lnalpha)
    gen xg=rgamma(1/alph, alph*p2)
    gen pg=rpoisson(xg)
    gen zi=runiform()>p1
    gen newcigs=zi*pg

    zinb newcigs drug  week, inf(drug  week)


*   For searches and help try:

© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index