Notice: On March 31, it was **announced** that Statalist is moving from an email list to a **forum**. The old list will shut down on April 23, and its replacement, **statalist.org** is already up and running.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

From |
Simon Falck <simon.falck@abe.kth.se> |

To |
"statalist@hsphsun2.harvard.edu" <statalist@hsphsun2.harvard.edu> |

Subject |
st: Poisson and Negbin models |

Date |
Sun, 28 Oct 2012 21:22:28 +0000 |

Hello, I have a few questions related to Poisson and Negbin models. Using a cross-section, I am estimating the number of new firms (Y) across 72 countries (N) as a function of a range of different country attributes (X1, X2…Xn). There is no time or dummy variables included. All regressors take continues values. Given that Y take non-negative integers and have a mean <10, a count data approach is appropriate, why I choose to apply standard Poission and Negbin models. The first indication is it that Y is overdispersed as the mean and the variance is not equal, nor close being equal (mean 4.347222 < var 542.6806). A formal Goodness-of-fit test of Y alone, using –estat gof- after -poisson $y-, indicates Y is significantly different from a Poisson distribution (chi2 = 1726.882, Prob > chi2(71) = 0.0000). Similarly the LR test of alpha related to the output from -nbreg $y- indicates that the negbin model is preferred over the Poisson (LR=0: chibar2(01) = 1570.16 Prob>=chibar2 = 0.000). When I run the model Y=X1 X2…Xn, using the -nbreg- command, I end up with some problems. The model outcome indicates some problem with the alpha, and a LR test indicating that the Poission is preferred over the negbin model: . nbreg $y $xlist, nolog irr Negative binomial regression Number of obs = 72 LR chi2(7) = 112.28 Dispersion = mean Prob > chi2 = 0.0000 Log likelihood = -59.651782 Pseudo R2 = 0.4848 ------------------------------------------------------------------------------ DV | IRR Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- X1 | 1.036879 .0295334 1.27 0.204 .9805806 1.096409 X2 | .994819 .0136695 -0.38 0.705 .9683849 1.021975 X3 | .5148558 .1462298 -2.34 0.019 .2950711 .8983481 X4 | .9809783 .0158745 -1.19 0.235 .9503532 1.01259 X5 | 1.325681 .0718518 5.20 0.000 1.192076 1.47426 X6 | .138362 .0542472 -5.04 0.000 .0641636 .2983629 X7 | 1.059356 .0270835 2.26 0.024 1.007581 1.113791 -------------+---------------------------------------------------------------- /lnalpha | -18.90698 558.4214 113.393 1075.579 -------------+---------------------------------------------------------------- alpha | 6.15e-09 3.43e-06 0 . ------------------------------------------------------------------------------ Likelihood-ratio test of alpha=0: chibar2(01) = 1.2e-05 Prob>=chibar2 = 0.499 The alpha seems to imply some problem, which is confirmed if I try to compute the predicted rate and probabilities for count models using –prcounts-, from which I end up with following error: . prcounts nb, plot problem with alpha prevents estimation of predicted probabilities. r(198); end of do-file r(198); I could be mentioned that Scott & Longs -countfit- indicates that a negbin model is preferred, over the Poission, and zero-inflated models. Furthermore, If I run a Poisson model, using –poisson-, and compare the outcome, I end up with very similar results (coef, LogL, AIC, BIC), yet, the Pseudo R-squared is quite different: PO=0.934, NB=0.485) . poisson $y $xlist, irr nolog Poisson regression Number of obs = 72 LR chi2(7) = 1682.44 Prob > chi2 = 0.0000 Log likelihood = -59.651788 Pseudo R2 = 0.9338 DV IRR Std. Err. z P>z [95% Conf. Interval] X1 1.036879 .0295334 1.27 0.204 .9805804 1.096409 X2 .9948191 .0136695 -0.38 0.705 .968385 1.021975 X3 .5148558 .1462298 -2.34 0.019 .295071 .8983482 X4 .9809785 .0158745 -1.19 0.235 .9503533 1.012591 X5 1.325681 .0718518 5.20 0.000 1.192076 1.47426 X6 .1383621 .0542473 -5.04 0.000 .0641636 .2983633 X7 1.059355 .0270835 2.26 0.024 1.007581 1.113791 When I compare the observed and predicted values of Y, using -prcounts, the Poisson model seems to do a quite good job, e.g. .list $y nbrate in 1/10 DV nbrate ---------------- 1. 192 194 2. 41 34 3. 35 36 4. 6 5 5. 4 3 ---------------- 6. 3 3 7. 3 2 8. 3 0 9. 2 2 10. 2 2 I would appreciate if someone could explain what seems to be the problem(s) here, and some indication on the problem related to the alpha in the negbin model. One could argue that since the assumption on equidispersion that apply to the Poisson model appears not to hold, the PO-model outcome is quite “flattering”, perhaps too flattering (?). I am aware of that N is relatively small for a maximum LL model, but not sure if, and then how, this impact the model outcome in this particular situation. It could be mentioned that there is some collinearity between the regressors but which should not cause too much problems. Thanks in advance, /Simon * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/faqs/resources/statalist-faq/ * http://www.ats.ucla.edu/stat/stata/

**Follow-Ups**:**Re: st: Poisson and Negbin models***From:*Maarten Buis <maartenlbuis@gmail.com>

- Prev by Date:
**st: Best practices for scientific computing** - Next by Date:
**Re: st:"endoegnous binary regressor"** - Previous by thread:
**st: Best practices for scientific computing** - Next by thread:
**Re: st: Poisson and Negbin models** - Index(es):