Statalist


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: st: time efficient way to choose variables


From   "Nick Cox" <n.j.cox@durham.ac.uk>
To   <statalist@hsphsun2.harvard.edu>
Subject   RE: st: time efficient way to choose variables
Date   Wed, 4 Feb 2009 18:12:45 -0000

I'd add another reference. I am currently looking at a more recent book
by Izenman. All the details are in 

<http://www.springer.com/statistics/statistical+theory+and+methods/book/
978-0-387-78188-4> 

The same website promises a second edition of Hastie et al. for next
month! 

I think Jay is right. There is not much by way of implementation of
these methods in Stata.  

Nick 
n.j.cox@durham.ac.uk 

jverkuilen

As others have noted, this is a variant of the long discredited stepwise
regression. 

There are better automatic variable selection procedures developed by
the machine learning people that go under colorful names like bagging
and boosting. These all use some kind of cross-validation or
bootstrapping to protect against capitalization on chance that older
stepwise procedures are very susceptible to. I don't think they are
implemented in Stata, but maybe someone has. See, e.g., T Hastie, R
Tibshirani, J Friedman. 2000. Elements of statistical learning.
Springer. 

Model averaging is another approach. This pools predictions from models
using weights derived from goodness of fit measures, again protecting
against capitalization on chance by using bootstrapping of some sort.
See, e.g., KA Burnham and D Anderson. 2003. Model selection and
multimodel inference, 2nd Ed. Springer. 


*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index