[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

From |
Steven Samuels <sjhsamuels@earthlink.net> |

To |
statalist@hsphsun2.harvard.edu |

Subject |
Re: st: time efficient way to choose variables |

Date |
Wed, 4 Feb 2009 18:27:03 -0500 |

A google search on "austin tu bootstrap stepwise" turned up this:

-Steve On Feb 4, 2009, at 3:04 PM, Hardy, Dale S wrote:

Tony, Can you send me the reference to this paper. Thanks. -----Original Message----- From: owner-statalist@hsphsun2.harvard.edu[mailto:owner-statalist@hsphsun2.harvard.edu] On Behalf OfLachenbruch,Peter Sent: Wednesday, February 04, 2009 1:53 PM To: statalist@hsphsun2.harvard.edu Subject: RE: st: time efficient way to choose variables The lasso and LARS methods are also possible for this purpose. Stata has a LARS ado written by Adrian Mander - it also does the lasso.A recent paper (2004) by Austin and Tu discusses usingbootstrapping inconjunction with stepwise regression - they sense of their article is that the variables selected gives a hint at the frequency of the selection distribution. An interesting variant is to combine this with missing values... Tony Peter A. Lachenbruch Department of Public Health Oregon State University Corvallis, OR 97330 Phone: 541-737-3832 FAX: 541-737-4001 -----Original Message----- From: owner-statalist@hsphsun2.harvard.edu [mailto:owner-statalist@hsphsun2.harvard.edu] On Behalf Of jverkuilen Sent: Wednesday, February 04, 2009 6:13 AM To: statalist@hsphsun2.harvard.edu Subject: RE: st: time efficient way to choose variablesAs others have noted, this is a variant of the long discreditedstepwiseregression. There are better automatic variable selection procedures developed by the machine learning people that go under colorful names like bagging and boosting. These all use some kind of cross-validation or bootstrapping to protect against capitalization on chance that older stepwise procedures are very susceptible to. I don't think they are implemented in Stata, but maybe someone has. See, e.g., T Hastie, R Tibshirani, J Friedman. 2000. Elements of statistical learning. Springer.Model averaging is another approach. This pools predictions frommodelsusing weights derived from goodness of fit measures, again protecting against capitalization on chance by using bootstrapping of some sort. See, e.g., KA Burnham and D Anderson. 2003. Model selection and multimodel inference, 2nd Ed. Springer. -----Original Message----- From: "Hardy, Dale S" <Dale.S.Hardy@uth.tmc.edu> To: statalist@hsphsun2.harvard.edu Sent: 2/3/2009 10:21 PM Subject: st: time efficient way to choose variables I have data in which I want to pick out variables associated with developing a disease. Each time I run the foreach command with the covariates, I cut out the one variable with the highest Z value with p value <0.05, and I put this variable in the second equation (stcox)until I have no variables with p value <0.05 left when I run themodelswith the foreach command. Here is an example below: foreach var of varlist agegrp racecode1 s_sex1 ses_pov ajcc6seer6_1sizeband pnnumb grade_s lung4 comorbid treat2r xrt3 seer1dxyear_cate {stcox PAC1 `var` }Then I choose the variable with the highest z score with p value<0.05Then run the model again. Comorbid is taken out because of itshighest Zscore and placed in the second equation. foreach var of varlist agegrp racecode1 s_sex1 ses_pov ajcc6seer6_1 sizeband pnnumb grade_s lung4 treat2r xrt3 seer1 dxyear_cate { stcox PAC1 comorbid `var` } Third run: Sizeband was chosen because of the highest Z score with p value <0.05 This was placed in the second model: foreach var of varlist agegrp racecode1 s_sex1 ses_pov ajcc6seer6_1 pnnumb grade_s lung4 treat2r xrt3 seer1 dxyear_cate { stcox PAC1 comorbid sizeband `var` }I do this until there is no more variables with p value <0.05 tochoosefrom. 1. My question is how can I do this process very quickly and time efficient. Do I use an array? Can you show me how to do this? 2. Is there also a time efficient process in looking for effect modifiers using several variables (one at a time in separate models) using the likelihood ratio test? Thanks. * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/ * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/ * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/ * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

* * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

**References**:**RE: st: time efficient way to choose variables***From:*jverkuilen <jverkuilen@gc.cuny.edu>

**RE: st: time efficient way to choose variables***From:*"Lachenbruch, Peter" <Peter.Lachenbruch@oregonstate.edu>

**RE: st: time efficient way to choose variables***From:*"Hardy, Dale S" <Dale.S.Hardy@uth.tmc.edu>

- Prev by Date:
**st: Bug in -use- or -if- ?** - Next by Date:
**st: Re: Bug in -use- or -if- ?** - Previous by thread:
**RE: st: time efficient way to choose variables** - Next by thread:
**RE: st: time efficient way to choose variables** - Index(es):

© Copyright 1996–2015 StataCorp LP | Terms of use | Privacy | Contact us | What's new | Site index |