Once you have the model, then you do the regression with only the "real"
data. If you come up with a different "best" model each time you do the
imputation, then probably any of these "best" ones will do. The
difference between them is noise.
Al F.
-----Original Message-----
From: owner-statalist@hsphsun2.harvard.edu
[mailto:owner-statalist@hsphsun2.harvard.edu] On Behalf Of
Timothy.Mak@iop.kcl.ac.uk
Sent: Friday, September 01, 2006 10:15 AM
To: statalist@hsphsun2.harvard.edu
Subject: Re: st: stepwise (
Hi Richard,
I specified 'Forward' selection in both SPSS and Stata, and my
understanding of 'Forward' is:
You start with the empty model, ie with just the constant, then you add
each one in turn, then you examine the p values. You include in the
model the one with the lowest p value if it also is within the criteria,
say p<.15. Then you repeat again, retaining that variable, until that
your variable with the lowest p value is no longer less than .15. This
procedure I learnt from Hosmer and Lemeshow: Applied Logistic
Regression.
I don't see why you can't work with all available data at each try.
Arguably there is the down side that you're comparing models with
different number of observations. But it just bothers me that at the end
of the day I have a final model that doesn't have the same results as
when I simply enter the variables. Moreover, if there are lots of
variables, we may end up running the procedure on only half of the data,
which is a bit stupid I think. Alan, thanks for the suggestion of
multiple imputation, but that's not my concern at the moment, because I
won't be using it simply because it's too complicated. In any case, how
do you run stepwise regression on several different imputed datasets and
decide on one final one at the end?
Tim
Richard Williams <Richard.A.Williams.5@ND.edu> Sent by:
owner-statalist@hsphsun2.harvard.edu
01/09/2006 16:49
Please respond to
statalist@hsphsun2.harvard.edu
To
statalist@hsphsun2.harvard.edu
cc
Subject
Re: st: stepwise
At 09:28 AM 9/1/2006, Timothy.Mak@iop.kcl.ac.uk wrote:
>Hi Stata list,
>
>When it come to stepwise regression, both SPSS and Stata do something I
>don't know why it does. Given the made-up dataset below, where y has 8
>observations and x1 has 8 and x2 has 7.
>
> y x1 x2
> 4.00 1.00 .00
> 5.00 1.00 .00
> 6.00 1.00 .00
> 7.00 1.00 1.00
> 8.00 .00 .00
> 9.00 .00 .00
> 10.00 .00 .
> 11.00 .00 .00
>
>
>If I run a stepwise regression of y on x1 and x2, using the Forward
>procedure, then the final 'selected' model, which is equivalent to the
y
>on x1 regression, only uses 7 observations. Is it based on any
statistical
>principle that models should be selected this way? If not, why does
Stata
>not at least provide an option where you can use all available
>observations in the selection process?
I believe both SPSS and Stata start by doing listwise deletion of
MD. When choosing a model, the comparisons would get distorted if
different cases were being analyzed at different steps, i.e. you
shouldn't compare a model with 8 cases and x1 to a model with 7 cases
and X1 and X2.
-------------------------------------------
Richard Williams, Notre Dame Dept of Sociology
OFFICE: (574)631-6668, (574)631-6463
FAX: (574)288-4373
HOME: (574)289-5227
EMAIL: Richard.A.Williams.5@ND.Edu
WWW (personal): http://www.nd.edu/~rwilliam
WWW (department): http://www.nd.edu/~soc
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/