Stata: Data Analysis and Statistical Software

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

st: Poisson and Negbin models

From	Simon Falck <[email protected]>
To	"[email protected]" <[email protected]>
Subject	st: Poisson and Negbin models
Date	Sun, 28 Oct 2012 21:22:28 +0000

Hello,
 
I have a few questions related to Poisson and Negbin models. Using a cross-section, I am estimating the number of new firms (Y) across 72 countries (N) as a function of a range of different country attributes (X1, X2…Xn). There is no time or dummy variables included. All regressors take continues values.
 
Given that Y take non-negative integers and have a mean <10, a count data approach is appropriate, why I choose to apply standard Poission and Negbin models. The first indication is it that Y is overdispersed as the mean and the variance is not equal, nor close being equal (mean 4.347222 < var 542.6806). A formal Goodness-of-fit test of Y alone, using –estat gof- after -poisson $y-, indicates Y is significantly different from a Poisson distribution (chi2 = 1726.882, Prob > chi2(71) = 0.0000). Similarly the LR test of alpha related to the output from -nbreg $y- indicates that the negbin model is preferred over the Poisson (LR=0:  chibar2(01) = 1570.16 Prob>=chibar2 = 0.000).
 
When I run the model Y=X1 X2…Xn, using the -nbreg- command, I end up with some problems. The model outcome indicates some problem with the alpha, and a LR test indicating that the Poission is preferred over the negbin model:
 
. nbreg $y $xlist, nolog irr
 
Negative binomial regression                      Number of obs   =         72
                                                  LR chi2(7)      =     112.28
Dispersion     = mean                             Prob > chi2     =     0.0000
Log likelihood = -59.651782                       Pseudo R2       =     0.4848
 
------------------------------------------------------------------------------
       DV |        IRR   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
     X1 |   1.036879   .0295334     1.27   0.204     .9805806    1.096409
     X2 |    .994819   .0136695    -0.38   0.705     .9683849    1.021975
     X3 |   .5148558   .1462298    -2.34   0.019     .2950711    .8983481
     X4 |   .9809783   .0158745    -1.19   0.235     .9503532     1.01259
     X5 |   1.325681   .0718518     5.20   0.000     1.192076     1.47426
     X6 |    .138362   .0542472    -5.04   0.000     .0641636    .2983629
     X7 |   1.059356   .0270835     2.26   0.024     1.007581    1.113791
-------------+----------------------------------------------------------------
    /lnalpha |  -18.90698   558.4214       113.393    1075.579
-------------+----------------------------------------------------------------
       alpha |   6.15e-09   3.43e-06                             0           .
------------------------------------------------------------------------------
Likelihood-ratio test of alpha=0:  chibar2(01) = 1.2e-05 Prob>=chibar2 = 0.499
 
The alpha seems to imply some problem, which is confirmed if I try to compute the predicted rate and probabilities for count models using –prcounts-, from which I end up with following error:
. prcounts nb, plot
problem with alpha prevents estimation of predicted probabilities.
r(198);
end of do-file
r(198);
 
I could be mentioned that Scott & Longs -countfit- indicates that a negbin model is preferred, over the Poission, and zero-inflated models. Furthermore, If I run a Poisson model, using –poisson-, and compare the outcome, I end up with very similar results (coef, LogL, AIC, BIC), yet, the Pseudo R-squared is quite different: PO=0.934, NB=0.485)
 
. poisson $y $xlist, irr nolog
 
Poisson regression                          Number of obs   =            72
                                LR chi2(7)      =    1682.44
                                Prob > chi2     =  0.0000
Log likelihood = -59.651788                          Pseudo R2       = 0.9338
 
DV         IRR   Std. Err.       z              P>z     [95% Conf.              Interval]              
X1    1.036879   .0295334 1.27        0.204     .9805804               1.096409
X2    .9948191   .0136695 -0.38      0.705      .968385                1.021975
X3    .5148558   .1462298 -2.34      0.019      .295071                .8983482
X4    .9809785   .0158745 -1.19      0.235     .9503533               1.012591
X5    1.325681   .0718518 5.20        0.000     1.192076               1.47426
X6    .1383621   .0542473 -5.04      0.000     .0641636               .2983633
X7    1.059355   .0270835 2.26        0.024     1.007581               1.113791
                                               
When I compare the observed and predicted values of Y, using -prcounts, the Poisson model seems to do a quite good job, e.g.
 
.list $y nbrate in 1/10
 
                DV   nbrate
                ----------------
1.            192      194
2.            41       34
3.            35       36
4.            6        5
5.            4        3
                ----------------
6.            3        3
7.            3        2
8.            3        0
9.            2        2
10.          2        2
 
I would appreciate if someone could explain what seems to be the problem(s) here, and some indication on the problem related to the alpha in the negbin model. One could argue that since the assumption on equidispersion that apply to the Poisson model appears not to hold, the PO-model outcome is quite “flattering”, perhaps too flattering (?). I am aware of that N is relatively small for a maximum LL model, but not sure if, and then how,  this impact the model outcome in this particular situation. It could be mentioned that there is some collinearity between the regressors but which should not cause too much problems.

Thanks in advance,
/Simon
 
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/

Follow-Ups:
- Re: st: Poisson and Negbin models
  - From: Maarten Buis <[email protected]>

Prev by Date: st: Best practices for scientific computing
Next by Date: Re: st:"endoegnous binary regressor"
Previous by thread: st: Best practices for scientific computing
Next by thread: Re: st: Poisson and Negbin models
Index(es):
- Date
- Thread