Statalist


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

R: st: AW: estimation of series of OLS regressions based on t-values from previous regression


From   "Carlo Lazzaro" <[email protected]>
To   <[email protected]>
Subject   R: st: AW: estimation of series of OLS regressions based on t-values from previous regression
Date   Sat, 31 Jan 2009 11:52:28 +0100


Dear sdm1,
as Maarten and Nick pointed out, there are some problems concerning the
reliability of stepwise procedure, which are listed in the following FAQ

(please search)  http://www.stata.com/support/statalist/faq for the
following item:


Title    Problems with stepwise regression  
Author  Bill Sribney, StataCorp  
Date  May 1998  

Besides, as far the health economic issue is concerned, I agree with Richard
Williams' advice: researcher' experience based on substantive theoretical
assumptions in selecting the best hospital cost predictors is better than
any stepwise procedure. In fact, in cost analysis it may happen to have
regressors which behave in strange ways (with a t<0.05 but probably
meaningless from a strictly economic point of view; with a t>0.05 against
any expectations and theoretical assumptions).
Last but not least, you research report will eventually land on a
decision-maker'table, who may find hard to get any technicalities resting
behind your work: again, selectivity will be a great gift (this is a lesson
that I have recently learnt from three referee's comments!).

Kind Regards,
Carlo



-----Messaggio originale-----
Da: [email protected]
[mailto:[email protected]] Per conto di Nick Cox
Inviato: venerdì 30 gennaio 2009 20.15
A: [email protected]
Oggetto: RE: st: AW: estimation of series of OLS regressions based on
t-values from previous regression

In addition to the very serious methodological issues quite rightly
raised, I'm wondering what kind of performance this would be expected to
produce. If I throw 200 noise predictors at a response, I expect to get
a pretty good R-square, for example. (They shouldn't create much
difficulty over multicollinearity....) Conversely, if 200 sensible
predictors aren't enough, why should one expect 200 more to do much
better? 

Calibrating any procedure against what happens with stochastic garbage
would seem essential. 

P.S. anyone contemplating stepwise who hasn't read Frank Harrell,
Regression modeling strategies, Springer, New York 2001, should seek it
out straight away. 

Nick 
[email protected] 

Maarten buis
Sent: 30 January 2009 16:55
To: [email protected]
Subject: RE: st: AW: estimation of series of OLS regressions based on
t-values from previous regression

--- sdm1 <[email protected]> wrote:
> I'm afraid that it will have to be done (considerably) more than
> once!  If anyone could offer an idea of how to get started with a
> program, that would be much appreciated.

You seem to be aware that applying this method on that number of
variables will mean that your results cannot in any way be generalized
outside your data. Just out of curiosity, could you give a bit of
substantive background behind your problem, showing that
generalizability is not of interest in your case? It would be nice to
have a real life example of an exception to the rule that
stepwise/datamining/datasnooping is evil. 


*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2024 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index