Statalist


[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

RE: st: RE: Testing nested models using logistic regression with robust standard errors


From   "Lachenbruch, Peter" <Peter.Lachenbruch@oregonstate.edu>
To   <statalist@hsphsun2.harvard.edu>
Subject   RE: st: RE: Testing nested models using logistic regression with robust standard errors
Date   Mon, 28 Apr 2008 14:47:43 -0700

There may be a greater horror in store for us when we try to develop
models - if there are missing values, the number of observations in each
of the runs likely will differ.  The variables you select will depend on
the order in which you drop them...  there are no good solutions for
this.  I've tried one possibility which is to require e(sample)=1 from
the full model and then continue - this is equivalent (I think to a
backward stepping model which is yukky.  Another possibility is to use
multiple imputation and then drop the least significant variables. You
can't use backward stepping, but it's a simple process with ice and mim.

Anyway, beware the missing value.

Tony

Peter A. Lachenbruch
Department of Public Health
Oregon State University
Corvallis, OR 97330
Phone: 541-737-3832
FAX: 541-737-4001


-----Original Message-----
From: owner-statalist@hsphsun2.harvard.edu
[mailto:owner-statalist@hsphsun2.harvard.edu] On Behalf Of John LeBlanc
Sent: Monday, April 28, 2008 1:56 PM
To: Nick Cox
Subject: Re: st: RE: Testing nested models using logistic regression
with robust standard errors

Thanks for the reply. I take your point about the limitations of sw
regression and I will be more hesitant in using them. However, whether
one uses sw or whether a more appropriate theory-driven approach with
thoughtful removal of variables, there is still a problem of testing
whether a more parsimonious model differs in the fit of the data from
its more saturated model.

Is there any alternative to lrtest that is appropriate for robust SE? Is
the problem that one can't really specify the error distributions of
these models when robust SE are used?


On Mon, 28 Apr 2008 19:49:10 +0100, Nick Cox wrote:
I can't answer your deeper question about nested models.

The simpler question here is about the decision rule to drop variables
during the stepwise procedure. Stata is using precisely the decision
rule you specified in your command, -pr(0.2)-. That is the significance
level for removal, as shown in action in your output.

If you specify robust standard errors, what does and does not satisfy
this rule may well change, as with different standard errors different
significance levels will be calculated, but again you get what you ask
for.

On what you should do, that depends on how seriously you take the advice
of Frank Harrell and others that stepwise methods are generally a bad
idea. (Google for sources.)
Similarly, every expert has a different way to balance parsiomony and
goodness of fit, and I would not want to try to add another.

Nick
n.j.cox@durham.ac.uk

John LeBlanc, reporting a query from Magda Szumilas

I'm a graduate student who is new to Stata. For my thesis, I'm trying to
figure out how I can test nested models when I'm forced to use robust
standard errors. Stata tells me that I can't use lrtest and I understand
that, since it depends on maximum likelihood estimates. So what does one
use?

Here's what I did. Having done an initial backwards stepwise logistic
regression at pr(0.2), I would like to manually create a parsimonious
model with the best possible fit. I assume that Stata is using some
decision rule to drop variables during the stepwise procedure; is this
what I should use when I try to drop them manually? What is Stata's
decision rule for stepwise logistic regression using robust standard
errors?

I found nothing in the manual and nothing helpful after extensive
searching on the web.

**************************************

An example below:

. xi: sw logistic usemh3 i.grade sexorcat markcat partcat livecat
edumomcat edudadcat sexriskcat anysmoke if sex==1, cluster(site) pr(0.2)
i.grade           _Igrade_10-12       (naturally coded; _Igrade_10
omitted)
                      begin with full model
p = 0.6664 >= 0.2000  removing markcat
p = 0.6006 >= 0.2000  removing edumomcat
p = 0.5856 >= 0.2000  removing _Igrade_12
p = 0.2054 >= 0.2000  removing sexorcat
p = 0.2113 >= 0.2000  removing _Igrade_11
p = 0.2592 >= 0.2000  removing partcat

Logistic regression                               Number of obs   =
580
                                                  Wald chi2(1)    =
..
                                                  Prob > chi2     =
..
Log pseudolikelihood = -266.26595                 Pseudo R2       =
0.0691

                                   (Std. Err. adjusted for 3 clusters in
site)
------------------------------------------------------------------------
------
             |               Robust
      usemh3 | Odds Ratio   Std. Err.      z    P>|z|     [95% Conf.
Interval]
-------------+----------------------------------------------------------
------
     livecat |   .5896426   .1786052    -1.74   0.081      .325654
1.067631
   edudadcat |   1.602875   .1808557     4.18   0.000     1.284863
1.999597
  sexriskcat |   .4266733   .0246379   -14.75   0.000     .3810162
.4778014
    anysmoke |   2.502815    .266854     8.60   0.000     2.030824
3.084503
------------------------------------------------------------------------
------

. estimates store full

. xi: sw logistic usemh3 i.grade sexorcat markcat partcat livecat
edumomcat edudadcat anysmoke if sex==1, cluster(site) pr(0.2)
i.grade           _Igrade_10-12       (naturally coded; _Igrade_10
omitted)
                      begin with full model
p = 0.6856 >= 0.2000  removing markcat
p = 0.5475 >= 0.2000  removing _Igrade_12
p = 0.2756 >= 0.2000  removing sexorcat
p = 0.2803 >= 0.2000  removing partcat
p = 0.2756 >= 0.2000  removing _Igrade_11

Logistic regression                               Number of obs   =
600
                                                  Wald chi2(1)    =
..
                                                  Prob > chi2     =
..
Log pseudolikelihood =  -284.2349                 Pseudo R2       =
0.0489

                                   (Std. Err. adjusted for 3 clusters in
site)
------------------------------------------------------------------------
------
             |               Robust
      usemh3 | Odds Ratio   Std. Err.      z    P>|z|     [95% Conf.
Interval]
-------------+----------------------------------------------------------
------
     livecat |   .7364448   .1749249    -1.29   0.198     .4623359
1.173067
   edudadcat |   1.366027   .2559876     1.66   0.096     .9461231
1.97229
   edumomcat |    1.35079    .278478     1.46   0.145      .901788
2.02335
    anysmoke |   2.571288   .1562286    15.54   0.000     2.282615
2.896468
------------------------------------------------------------------------
------

. lrtest full
LR test likely invalid for models with robust vce
r(498);


*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index