[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

st: overdispersion and underdispersion in nbreg / glm models

From	[email protected]
To	[email protected]
Subject	st: overdispersion and underdispersion in nbreg / glm models
Date	Thu, 18 Dec 2008 14:49:19 -0500

I take the Digest, and try to scan through the contents when possible.I'm pleased that I happened

to catch your query.

Overdispersion in count models can arise from a wide variety ofreasons. Identifying the source ofoverdispersion can help in finding a remedy for it. In some cases theremedy is such that when applied,the model is no longer overdispersed. I call this apparentoverdispersion. In other situations, the remedydoes not eliminate the fact that the data is overdispersed, but itadjusts the model -- usually the standarderrors - so that the effect of bias as a result of the overdispersionis minimized. These types of models

are ones that have real overdispsersion.

In the book, I create a simulated Poisson model with 3 or 4 definedparameter estimates. That is, for example,I define xb = b0 + b1*x1 + b2*x2 + b3*x3 with specific values for b*;eg xb = 1 + .5*x1 + .75*x2 - 1.2*x3The x* values are all separetely created random normal deviates; eg--gen x1= invnorm(uniform)--[[now should be invnorm(runiform)]]. I then use the values of xb in thecommand, --rndpoisx-- or --genpoisson--.The result is a Poisson random variate, xp, structured by the values ofxb. Running ---glm xp x1 x2 x3, fam(poi)--results in a Poisson model with parameter estimates and intercepthaving values very close if not identical to thevalues specified. The Pearson dispersion statistic is also very closeto 1.0.

I then remodel the data, taking out one of the predictors, let's sayx1. --glm x2 x3, fam(poi)--The parameter estimates are generally not the specified ones, and, moreimportantly, the dispersion statistic becomes

greater than 1. Sometimes it is substantially greater than 1.

What does this tell us? Well, when we are modeling data, we generallydon't know what the parameter estimates aregoing to be in advance. If we do find, though, that the dispersionstatistic substantally differs from 1.0, then we know that the model isnot well fitted. We may not know why though. In this case it wasbecause a necessary predictor was missing from the model. In realsituations, we hope that a variable is available in the data to remedythe fit; ie when put into themodel, the dispersion closely approximates 1.0. The requisitepredictor, however, may not have been collected. Again, in realsituations, the missing predictor is one that is required to amend theextra correlation in the data, reflected by the dispersion statistic.All of this discussion is within the context of a Poisson model.

You appear to have modeled the data as negative binomial (NB-2) ratherthan Poisson. The way you obained the value for alpha for inclusion inthe GLM NB model was correct. What many folks forget, though, is thatthe NB model can itself be extradispersed. It may, for example, havemore variance in the data than allowed given the value of the mean.Rather than compare mu and mu, as in Poisson, here we compare mu andmu+a*mu*mu. The NB model may not adjust enough of the otherwise Poissonoverdispersion, and has a dispersion statistic of <1. Or, it mayovershoot and have excessive variance in the data - greater thanmu+a*mu*mu.

When I discussed the missing predictor and how it affects dispersion, Iwas focusing on differentiating apparent from real overdispersion. Idid not address the NB model. It had to do with eliminatingoverdispersion from within the Poisson model.Here you are doing something quite different. It appears to be aquestion as to why adding a particular predictor can change the modelfrom being underdispersed (a<1) to overdispersed (a>1). But here we arereferring to NB overdispersion, not Poisson overdispersion.

The addition of the new predictor evidently added considerably morecorrelation to the data. I'll suspect that if you display thecorrelations between all of the variables in the model, that the newpredictor would be rather highly correlated with one of more of theother variables. However, the interaction of the variables may be suchthat the extra correlation may not show in such a manner, but that toois rare.

In any case, treat the inclusion of the varialble as any otherpredictor; test it using the likelihood ratio test. It likely does notcontribute to the model. If so, exclude it, and search for otherreasons why there may be underdispersion. It may be that the data issimply NB-underdispsersed (in distinction to poisson overdispersed),and adjustments can be made to the SEs, eg robust SEs. I suggest notscaling in this type of case, for reasons discussed in the book.

Perhaps I overkilled in my explanation, but I thought it important toclarify the relationships involved, and to show why the discussion ofthe missing predictor is not relevant to the solution of your query.

If you have additonal questions, you can contact me directly at[email protected]

Joseph Hilbe

============================================
Date: Wed, 17 Dec 2008 10:36:00 +0000
From: "Ada Ma" <[email protected]>
Subject: st: overdispersion and underdispersion in nbreg / glm models

Dear Statalisters,

I'd been following Joseph Hilbe's book "Negative Binomial Regression"
(2007) and using some of my own data to try out methods laid out in
the book.

The book suggested that one can look at the Pearson's dispersion
output from the -glm- command to check if one's negative binomial
model is affected by underdispersion or overdispersion.

In the book it says that if one's model is affected by overdispersion,
it could be caused by missing explanatory variable.  But my model
seems to be suggesting quite the opposite and I am not sure what to
do.

When I added an explanatory variable to the model the Pearson's stats
went from being underdispersed to overdispersed.  Both models are
estimated using the -glm- command with the "family(nb XXX)" option
specified, XXX being the alpha value taken from the -nbreg- command
output.  Although the AIC and BIC of the model with the additional
variable looks better (lower), I really don't know what is worse.
What I should do in order to resolve the dispersion problem and
frankly speaking, are there other things that would tell me which
model is better?  Shall I bootstrap and jacknife???

All suggestions welcomed.

Regards,
Ada

- --
Ada Ma
Research Fellow
Health Economics Research Unit
University of Aberdeen, UK.
http://www.abdn.ac.uk/heru/
Tel: +44 (0) 1224 553863
Fax: +44 (0) 1224 550926
*

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

Prev by Date: st: RE: How to convert a do-file (that uses ml functions) into an ado file
Next by Date: Re: st: RE: How to convert a do-file (that uses ml functions) into an ado file
Previous by thread: st: overdispersion and underdispersion in nbreg / glm models
Next by thread: st: Standard Error on a Survival median, weighted data
Index(es):
- Date
- Thread