Hi Richard,
I specified 'Forward' selection in both SPSS and Stata, and my
understanding of 'Forward' is:
You start with the empty model, ie with just the constant, then you add
each one in turn, then you examine the p values. You include in the model
the one with the lowest p value if it also is within the criteria, say
p<.15. Then you repeat again, retaining that variable, until that your
variable with the lowest p value is no longer less than .15. This
procedure I learnt from Hosmer and Lemeshow: Applied Logistic Regression.
I don't see why you can't work with all available data at each try.
Arguably there is the down side that you're comparing models with
different number of observations. But it just bothers me that at the end
of the day I have a final model that doesn't have the same results as when
I simply enter the variables. Moreover, if there are lots of variables, we
may end up running the procedure on only half of the data, which is a bit
stupid I think. Alan, thanks for the suggestion of multiple imputation,
but that's not my concern at the moment, because I won't be using it
simply because it's too complicated. In any case, how do you run stepwise
regression on several different imputed datasets and decide on one final
one at the end?
Tim
Richard Williams <Richard.A.Williams.5@ND.edu>
Sent by: owner-statalist@hsphsun2.harvard.edu
01/09/2006 16:49
Please respond to
statalist@hsphsun2.harvard.edu
To
statalist@hsphsun2.harvard.edu
cc
Subject
Re: st: stepwise
At 09:28 AM 9/1/2006, Timothy.Mak@iop.kcl.ac.uk wrote:
>Hi Stata list,
>
>When it come to stepwise regression, both SPSS and Stata do something I
>don't know why it does. Given the made-up dataset below, where y has 8
>observations and x1 has 8 and x2 has 7.
>
> y x1 x2
> 4.00 1.00 .00
> 5.00 1.00 .00
> 6.00 1.00 .00
> 7.00 1.00 1.00
> 8.00 .00 .00
> 9.00 .00 .00
> 10.00 .00 .
> 11.00 .00 .00
>
>
>If I run a stepwise regression of y on x1 and x2, using the Forward
>procedure, then the final 'selected' model, which is equivalent to the y
>on x1 regression, only uses 7 observations. Is it based on any
statistical
>principle that models should be selected this way? If not, why does Stata
>not at least provide an option where you can use all available
>observations in the selection process?
I believe both SPSS and Stata start by doing listwise deletion of
MD. When choosing a model, the comparisons would get distorted if
different cases were being analyzed at different steps, i.e. you
shouldn't compare a model with 8 cases and x1 to a model with 7 cases
and X1 and X2.
-------------------------------------------
Richard Williams, Notre Dame Dept of Sociology
OFFICE: (574)631-6668, (574)631-6463
FAX: (574)288-4373
HOME: (574)289-5227
EMAIL: Richard.A.Williams.5@ND.Edu
WWW (personal): http://www.nd.edu/~rwilliam
WWW (department): http://www.nd.edu/~soc
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/