Stata: Data Analysis and Statistical Software

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

st: constraints on number of observations in nbreg not working

From	"Cale, Grace E" <[email protected]>
To	"[email protected]" <[email protected]>
Subject	st: constraints on number of observations in nbreg not working
Date	Sun, 2 Mar 2014 21:03:00 +0000

Good evening!
I'm having an odd difficulty in trying to do some model testing, and have finally run out of ideas. I'd be deeply grateful to anyone who might be willing to help! If the details below are sufficient or unclear, I would be glad to discuss the issue further via personal emails (so as not to spam other list members).

I have a set of four nested models. All are regression models using the -nbreg- command (though I have also tried -glm- with the -family(nbinomial)- option).
I am attempting to compare these models using likelihood ratio tests (the -lrtest- command) and to compare AIC and BIC scores via use of -fitstat-, all of which require the models to have the same number of observations. 
However, any attempt I make to restrict the observations in the models and store the results does not appear to work. I get no error messages, except when using -lrtest- or -fitstat-, only notes saying that observations are not the same. This seems to occur whether I use -est store-, -fitstat, saving()- or restricting samples using -nbreg varlist if e(sample)-. I've tried several combinations of the above to try to get a solution, but below is an example of the phenomenon I am seeing:
----------------------------------------------------
Model 2 
-quietly nbreg polact  rlgatnd scva2 lrscale NEIVMMB unemp3m selfeff selfeff2 mistrst age2 urban partyid INCOMEC EDUC2 AGE GENDER2  race1 race2 race3 race5 if high_cooksdm2!=1, vce(robust)-
- fitstat, saving(m2)-

Measures of Fit for nbreg of polact
Log-Lik Intercept Only:    -1635.619     Log-Lik Full Model:        -1478.949
D(774):                     2957.897     LR(19):                      313.341
 Prob > LR:                     0.000
McFadden's R2:                 0.096     McFadden's Adj R2:             0.083
Maximum Likelihood R2:         0.326     Cragg & Uhler's R2:            0.326
AIC:                           3.773     AIC*n:                      2999.897
BIC:                       -2211.140     BIC':                       -186.453

(Indices saved in matrix fs_m2)


Model 1
-quietly nbreg polact rlgatnd scva2 lrscale NEIVMMB unemp3m urban partyid INCOMEC EDUC2 AGE GENDER2 race1 race2 race3 race5 if high _cooksdm1!= 1 & e(sample)-
-fitstat, using(m2)-

Measures of Fit for nbreg of polact

                             Current            Saved       Difference
Model:                         nbreg            nbreg
N:                               792              795               -3

Error: N's do not match. To make the comparisons, use the force option.
--------------------------------------
Notes: In both models, I used Cook's D to identify influential cases, and omitted them from the models using the -if high_cooksdm1 !=1- text. Interestingly, these commands do reduce the number of observations in the model, but they do not reduce the difference to zero. 
I have also tried removing the -if High_cooksd !=1- text, but saw no improvement. 
If I use the -fitstat, using(m2) force- options, I get contradictory BIC and AIC results:
-------------------------------------
-fitstat, using(m2) force-

Measures of Fit for nbreg of polact
                             Current            Saved       Difference
Model:                         nbreg            nbreg
N:                               792              795               -3
Log-Lik Intercept Only:    -1623.391        -1635.619           12.228
Log-Lik Full Model:        -1480.660        -1478.949           -1.711
D:                          2961.320(775)    2957.897(774)       3.423(1)
LR:                          285.462(15)      313.341(19)      -27.879(-4)
Prob > LR:                     0.000            0.000            0.000
McFadden's R2:                 0.088            0.096           -0.008
McFadden's Adj R2:             0.077            0.083           -0.005
Maximum Likelihood R2:         0.303            0.326           -0.023
Cragg & Uhler's R2:            0.303            0.326           -0.023
AIC:                           3.782            3.773            0.009
AIC*n:                      2995.320         2999.897           -4.577
BIC:                       -2211.465        -2211.140           -0.326
BIC':                       -185.344         -186.453            1.109

Primary Questions: Does anyone know why e(sample) might not work in limiting the samples in nested regressions? I have also tried generating a variable that, using an -if- statement should yield the desired effect, but I got the same result. 


Many thanks to anyone willing to assist with this quandary!
Grace Cale
Teaching Assistant
MA Program: Sociology
University of Kentucky

Email: [email protected]


*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/

Prev by Date: Re: st: Collapsing 44 variables to give out mean and sd for every varisble
Next by Date: Re: st: Collapsing 44 variables to give out mean and sd for every varisble
Previous by thread: st: Collapsing 44 variables to give out mean and sd for every varisble
Next by thread: st: Error 3499 using synth ("emptymat() not found")
Index(es):
- Date
- Thread