Statalist The Stata Listserver


[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

Re: st: stepwise


From   Timothy.Mak@iop.kcl.ac.uk
To   statalist@hsphsun2.harvard.edu
Subject   Re: st: stepwise
Date   Fri, 1 Sep 2006 16:15:23 +0100

Hi Richard, 

I specified 'Forward' selection in both SPSS and Stata, and my 
understanding of 'Forward' is: 

You start with the empty model, ie with just the constant, then you add 
each one in turn, then you examine the p values. You include in the model 
the one with the lowest p value if it also is within the criteria, say 
p<.15. Then you repeat again, retaining that variable, until that your 
variable with the lowest p value is no longer less than .15. This 
procedure I learnt from Hosmer and Lemeshow: Applied Logistic Regression. 

I don't see why you can't work with all available data at each try. 
Arguably there is the down side that you're comparing models with 
different number of observations. But it just bothers me that at the end 
of the day I have a final model that doesn't have the same results as when 
I simply enter the variables. Moreover, if there are lots of variables, we 
may end up running the procedure on only half of the data, which is a bit 
stupid I think. Alan, thanks for the suggestion of multiple imputation, 
but that's not my concern at the moment, because I won't be using it 
simply because it's too complicated. In any case, how do you run stepwise 
regression on several different imputed datasets and decide on one final 
one at the end? 

Tim




Richard Williams <Richard.A.Williams.5@ND.edu> 
Sent by: owner-statalist@hsphsun2.harvard.edu
01/09/2006 16:49
Please respond to
statalist@hsphsun2.harvard.edu


To
statalist@hsphsun2.harvard.edu
cc

Subject
Re: st: stepwise






At 09:28 AM 9/1/2006, Timothy.Mak@iop.kcl.ac.uk wrote:
>Hi Stata list,
>
>When it come to stepwise regression, both SPSS and Stata do something I
>don't know why it does. Given the made-up dataset below, where y has 8
>observations and x1 has 8 and x2 has 7.
>
>    y      x1    x2
>     4.00            1.00             .00
>     5.00            1.00             .00
>     6.00            1.00             .00
>     7.00            1.00            1.00
>     8.00             .00             .00
>     9.00             .00             .00
>    10.00             .00        .
>    11.00             .00             .00
>
>
>If I run a stepwise regression of y on x1 and x2, using the Forward
>procedure, then the final 'selected' model, which is equivalent to the y
>on x1 regression, only uses 7 observations. Is it based on any 
statistical
>principle that models should be selected this way? If not, why does Stata
>not at least provide an option where you can use all available
>observations in the selection process?

I believe both SPSS and Stata start by doing listwise deletion of
MD.  When choosing a model, the comparisons would get distorted if
different cases were being analyzed at different steps, i.e. you
shouldn't compare a model with 8 cases and x1 to a model with 7 cases
and X1 and X2.


-------------------------------------------
Richard Williams, Notre Dame Dept of Sociology
OFFICE: (574)631-6668, (574)631-6463
FAX:    (574)288-4373
HOME:   (574)289-5227
EMAIL:  Richard.A.Williams.5@ND.Edu
WWW (personal):    http://www.nd.edu/~rwilliam
WWW (department):    http://www.nd.edu/~soc

*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index