Statalist


[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

st: RE: Testing nested models using logistic regression with robust standard errors


From   "Nick Cox" <n.j.cox@durham.ac.uk>
To   <statalist@hsphsun2.harvard.edu>
Subject   st: RE: Testing nested models using logistic regression with robust standard errors
Date   Mon, 28 Apr 2008 19:49:10 +0100

I can't answer your deeper question about nested models. 

The simpler question here is about the decision rule to drop variables
during the stepwise procedure. Stata is using precisely the decision
rule you specified in your command, -pr(0.2)-. That is the significance
level for removal, as shown in action in your output. 

If you specify robust standard errors, what does and does not satisfy
this rule may well change, as with different standard errors different
significance levels will be calculated, but again you get what you ask
for. 

On what you should do, that depends on how seriously you take the advice
of Frank Harrell and others that stepwise methods are generally a bad
idea. (Google for sources.) 
Similarly, every expert has a different way to balance parsiomony and
goodness of fit, and I would not want to try to add another. 

Nick
n.j.cox@durham.ac.uk 

John LeBlanc, reporting a query from Magda Szumilas

I'm a graduate student who is new to Stata. For my thesis, I'm trying to
figure out how I can test nested models when I'm forced to use robust
standard errors. Stata tells me that I can't use lrtest and I understand
that, since it depends on maximum likelihood estimates. So what does one
use?

Here's what I did. Having done an initial backwards stepwise logistic
regression at pr(0.2), I would like to manually create a parsimonious
model with the best possible fit. I assume that Stata is using some
decision rule to drop variables during the stepwise procedure; is this
what I should use when I try to drop them manually? What is Stata's
decision rule for stepwise logistic regression using robust standard
errors?

I found nothing in the manual and nothing helpful after extensive
searching on the web.

**************************************

An example below:

. xi: sw logistic usemh3 i.grade sexorcat markcat partcat livecat
edumomcat edudadcat sexriskcat anysmoke if sex==1, cluster(site) pr(0.2)
i.grade           _Igrade_10-12       (naturally coded; _Igrade_10
omitted)
                      begin with full model
p = 0.6664 >= 0.2000  removing markcat
p = 0.6006 >= 0.2000  removing edumomcat
p = 0.5856 >= 0.2000  removing _Igrade_12
p = 0.2054 >= 0.2000  removing sexorcat
p = 0.2113 >= 0.2000  removing _Igrade_11
p = 0.2592 >= 0.2000  removing partcat

Logistic regression                               Number of obs   =
580
                                                  Wald chi2(1)    =
.
                                                  Prob > chi2     =
.
Log pseudolikelihood = -266.26595                 Pseudo R2       =
0.0691

                                   (Std. Err. adjusted for 3 clusters in
site)
------------------------------------------------------------------------
------
             |               Robust
      usemh3 | Odds Ratio   Std. Err.      z    P>|z|     [95% Conf.
Interval]
-------------+----------------------------------------------------------
------
     livecat |   .5896426   .1786052    -1.74   0.081      .325654
1.067631
   edudadcat |   1.602875   .1808557     4.18   0.000     1.284863
1.999597
  sexriskcat |   .4266733   .0246379   -14.75   0.000     .3810162
.4778014
    anysmoke |   2.502815    .266854     8.60   0.000     2.030824
3.084503
------------------------------------------------------------------------
------

. estimates store full

. xi: sw logistic usemh3 i.grade sexorcat markcat partcat livecat
edumomcat edudadcat anysmoke if sex==1, cluster(site) pr(0.2)
i.grade           _Igrade_10-12       (naturally coded; _Igrade_10
omitted)
                      begin with full model
p = 0.6856 >= 0.2000  removing markcat
p = 0.5475 >= 0.2000  removing _Igrade_12
p = 0.2756 >= 0.2000  removing sexorcat
p = 0.2803 >= 0.2000  removing partcat
p = 0.2756 >= 0.2000  removing _Igrade_11

Logistic regression                               Number of obs   =
600
                                                  Wald chi2(1)    =
.
                                                  Prob > chi2     =
.
Log pseudolikelihood =  -284.2349                 Pseudo R2       =
0.0489

                                   (Std. Err. adjusted for 3 clusters in
site)
------------------------------------------------------------------------
------
             |               Robust
      usemh3 | Odds Ratio   Std. Err.      z    P>|z|     [95% Conf.
Interval]
-------------+----------------------------------------------------------
------
     livecat |   .7364448   .1749249    -1.29   0.198     .4623359
1.173067
   edudadcat |   1.366027   .2559876     1.66   0.096     .9461231
1.97229
   edumomcat |    1.35079    .278478     1.46   0.145      .901788
2.02335
    anysmoke |   2.571288   .1562286    15.54   0.000     2.282615
2.896468
------------------------------------------------------------------------
------

. lrtest full
LR test likely invalid for models with robust vce
r(498);


*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index