Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

# st: Poisson and Negbin models

 From Simon Falck To "statalist@hsphsun2.harvard.edu" Subject st: Poisson and Negbin models Date Sun, 28 Oct 2012 21:22:28 +0000

```Hello,

I have a few questions related to Poisson and Negbin models. Using a cross-section, I am estimating the number of new firms (Y) across 72 countries (N) as a function of a range of different country attributes (X1, X2…Xn). There is no time or dummy variables included. All regressors take continues values.

Given that Y take non-negative integers and have a mean <10, a count data approach is appropriate, why I choose to apply standard Poission and Negbin models. The first indication is it that Y is overdispersed as the mean and the variance is not equal, nor close being equal (mean 4.347222 < var 542.6806). A formal Goodness-of-fit test of Y alone, using –estat gof- after -poisson \$y-, indicates Y is significantly different from a Poisson distribution (chi2 = 1726.882, Prob > chi2(71) = 0.0000). Similarly the LR test of alpha related to the output from -nbreg \$y- indicates that the negbin model is preferred over the Poisson (LR=0:  chibar2(01) = 1570.16 Prob>=chibar2 = 0.000).

When I run the model Y=X1 X2…Xn, using the -nbreg- command, I end up with some problems. The model outcome indicates some problem with the alpha, and a LR test indicating that the Poission is preferred over the negbin model:

. nbreg \$y \$xlist, nolog irr

Negative binomial regression                      Number of obs   =         72
LR chi2(7)      =     112.28
Dispersion     = mean                             Prob > chi2     =     0.0000
Log likelihood = -59.651782                       Pseudo R2       =     0.4848

------------------------------------------------------------------------------
DV |        IRR   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
X1 |   1.036879   .0295334     1.27   0.204     .9805806    1.096409
X2 |    .994819   .0136695    -0.38   0.705     .9683849    1.021975
X3 |   .5148558   .1462298    -2.34   0.019     .2950711    .8983481
X4 |   .9809783   .0158745    -1.19   0.235     .9503532     1.01259
X5 |   1.325681   .0718518     5.20   0.000     1.192076     1.47426
X6 |    .138362   .0542472    -5.04   0.000     .0641636    .2983629
X7 |   1.059356   .0270835     2.26   0.024     1.007581    1.113791
-------------+----------------------------------------------------------------
/lnalpha |  -18.90698   558.4214       113.393    1075.579
-------------+----------------------------------------------------------------
alpha |   6.15e-09   3.43e-06                             0           .
------------------------------------------------------------------------------
Likelihood-ratio test of alpha=0:  chibar2(01) = 1.2e-05 Prob>=chibar2 = 0.499

The alpha seems to imply some problem, which is confirmed if I try to compute the predicted rate and probabilities for count models using –prcounts-, from which I end up with following error:
. prcounts nb, plot
problem with alpha prevents estimation of predicted probabilities.
r(198);
end of do-file
r(198);

I could be mentioned that Scott & Longs -countfit- indicates that a negbin model is preferred, over the Poission, and zero-inflated models. Furthermore, If I run a Poisson model, using –poisson-, and compare the outcome, I end up with very similar results (coef, LogL, AIC, BIC), yet, the Pseudo R-squared is quite different: PO=0.934, NB=0.485)

. poisson \$y \$xlist, irr nolog

Poisson regression                          Number of obs   =            72
LR chi2(7)      =    1682.44
Prob > chi2     =  0.0000
Log likelihood = -59.651788                          Pseudo R2       = 0.9338

DV         IRR   Std. Err.       z              P>z     [95% Conf.              Interval]
X1    1.036879   .0295334 1.27        0.204     .9805804               1.096409
X2    .9948191   .0136695 -0.38      0.705      .968385                1.021975
X3    .5148558   .1462298 -2.34      0.019      .295071                .8983482
X4    .9809785   .0158745 -1.19      0.235     .9503533               1.012591
X5    1.325681   .0718518 5.20        0.000     1.192076               1.47426
X6    .1383621   .0542473 -5.04      0.000     .0641636               .2983633
X7    1.059355   .0270835 2.26        0.024     1.007581               1.113791

When I compare the observed and predicted values of Y, using -prcounts, the Poisson model seems to do a quite good job, e.g.

.list \$y nbrate in 1/10

DV   nbrate
----------------
1.            192      194
2.            41       34
3.            35       36
4.            6        5
5.            4        3
----------------
6.            3        3
7.            3        2
8.            3        0
9.            2        2
10.          2        2

I would appreciate if someone could explain what seems to be the problem(s) here, and some indication on the problem related to the alpha in the negbin model. One could argue that since the assumption on equidispersion that apply to the Poisson model appears not to hold, the PO-model outcome is quite “flattering”, perhaps too flattering (?). I am aware of that N is relatively small for a maximum LL model, but not sure if, and then how,  this impact the model outcome in this particular situation. It could be mentioned that there is some collinearity between the regressors but which should not cause too much problems.