Statalist


[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

Re: st: how can I test my NBREG model?


From   Steven Samuels <sjhsamuels@earthlink.net>
To   statalist@hsphsun2.harvard.edu
Subject   Re: st: how can I test my NBREG model?
Date   Tue, 17 Jun 2008 20:40:02 -0400

Cynthia:

I advise you to look at examples of negative binomial regression in a good text. But, to briefly answer your questions:


1. Negative binomial regression -nbreg-, and its extensions -zinb- (zero-inflated), -ztnb- (positive counts only), fit a model to the log of the mean, not to the mean. So, the signs and relative magnitudes of coefficients should be comparable. Wherever they differ, I would believe the count data model. Unlike multiple regression, -nbreg- accommodates differences in the potential size of observations through an "exposure" or "offset" variable; a more populous census tract would have more physicians than a smalle z r tract, for example, so one would standardize for population size.


2. In the learning sample, you can use all of Stata's facilities for choosing a subset of "best" models. Compare the fits of ordinary - nbreg- to -zinb-. Check goodness of fit by using -linktest- (especially useful with continuous variables) and by comparing the observed to predicted counts. Use robust standard errors for inference. Choose transformations of continuous predictors with - fracpoly- or -mfp- . Select from "all possible combinations" of sets of predictors with a command like Nick Cox's -cp- (available from SSC). Compare alternative models with the BIC criterion with the - estat- command.

3. To apply your best models to the validation sample, predict for observations not used to create the estimates. Here's an example

sysuse auto
reg mpg weight if foreign
predict yhat if !foreign // predicts for the other observations

Search also for "esample" to see another way of getting out-of-sample predictions.

4. Compare observed counts for your validation sample to those predicted by the learning sample. As a measure of "closeness" you might use a chi square statistic, divided by sample size. A rank correlation could also work, but others may suggest better approaches.

You don't say much about your data-whether they were weighted, clustered, or in panel form, so I haven't covered all bases. Still, I hope this gives you a start.


-Steve

On Jun 17, 2008, at 2:20 PM, Cynthia Lokker wrote:


Hi,
I have a set of data with my dependant variable being a count and with 19
independent variables. I originally performed a multiple regression on a 60%
subset (n=757) and validated the model on the remaining 40% (n=504).
It has since been brought to my attention to use a negative binomial
regression since this fits my data better. I would now like to repeat the
analysis and compare the general findings of the nbreg with the former
multiple regression (magnitude of co-efficients etc).
I have the following questions:
1. Is it feasible to compare (generally) the 2 types of analysis?
2. Can I validate my nbreg model in the same way as I did with the multiple
regression?
3. What stata commands would I need to use to do #2?

As ever, any assistance or guidance would be appreciated.
Thanks
Cynthia Lokker


*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/
*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index