"Hardy, Dale S" <Dale.S.Hardy@uth.tmc.edu>

<statalist@hsphsun2.harvard.edu>

st: time efficient way to choose variables

Tue, 3 Feb 2009 21:21:32 -0600

I have data in which I want to pick out variables associated with developing a disease. Each time I run the foreach command with the covariates, I cut out the one variable with the highest Z value with p value <0.05, and I put this variable in the second equation (stcox) until I have no variables with p value <0.05 left when I run the models with the foreach command. Here is an example below: foreach var of varlist agegrp racecode1 s_sex1 ses_pov ajcc6seer6_1 sizeband pnnumb grade_s lung4 comorbid treat2r xrt3 seer1 dxyear_cate { stcox PAC1 `var` } Then I choose the variable with the highest z score with p value <0.05 Then run the model again. Comorbid is taken out because of its highest Z score and placed in the second equation. foreach var of varlist agegrp racecode1 s_sex1 ses_pov ajcc6seer6_1 sizeband pnnumb grade_s lung4 treat2r xrt3 seer1 dxyear_cate { stcox PAC1 comorbid `var` } Third run: Sizeband was chosen because of the highest Z score with p value <0.05 This was placed in the second model: foreach var of varlist agegrp racecode1 s_sex1 ses_pov ajcc6seer6_1 pnnumb grade_s lung4 treat2r xrt3 seer1 dxyear_cate { stcox PAC1 comorbid sizeband `var` } I do this until there is no more variables with p value <0.05 to choose from. 1. My question is how can I do this process very quickly and time efficient. Do I use an array? Can you show me how to do this? 2. Is there also a time efficient process in looking for effect modifiers using several variables (one at a time in separate models) using the likelihood ratio test? Thanks. * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

