Statalist


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: st: time efficient way to choose variables


From   jverkuilen <[email protected]>
To   <[email protected]>
Subject   RE: st: time efficient way to choose variables
Date   Thu, 5 Feb 2009 01:46:05 -0500

Izenman's book seems nice. I am the book review editor for Journal of Educational and Behavioral Statistics and I have this one in review. (BTW if anyone has interest in a book people working in behavioral statistics might want to read and the expertise to evaluate it, let me know.) 

Most of the machine learning stuff seems to be in other environments, e.g., R, but Stata seems like a solid environment in which to do it as well---good bootstrapping facility, ability to handle big datasets efficiently, etc.   



-----Original Message-----
From: "Nick Cox" <[email protected]>
To: [email protected]
Sent: 2/4/2009 1:12 PM
Subject: RE: st: time efficient way to choose variables

I'd add another reference. I am currently looking at a more recent book
by Izenman. All the details are in 

<http://www.springer.com/statistics/statistical+theory+and+methods/book/
978-0-387-78188-4> 

The same website promises a second edition of Hastie et al. for next
month! 

I think Jay is right. There is not much by way of implementation of
these methods in Stata.  

Nick 
[email protected] 

jverkuilen

As others have noted, this is a variant of the long discredited stepwise
regression. 

There are better automatic variable selection procedures developed by
the machine learning people that go under colorful names like bagging
and boosting. These all use some kind of cross-validation or
bootstrapping to protect against capitalization on chance that older
stepwise procedures are very susceptible to. I don't think they are
implemented in Stata, but maybe someone has. See, e.g., T Hastie, R
Tibshirani, J Friedman. 2000. Elements of statistical learning.
Springer. 

Model averaging is another approach. This pools predictions from models
using weights derived from goodness of fit measures, again protecting
against capitalization on chance by using bootstrapping of some sort.
See, e.g., KA Burnham and D Anderson. 2003. Model selection and
multimodel inference, 2nd Ed. Springer. 


*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2024 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index