Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

st: RE: do file: t-score, dfuller, to sw regress


From   Steven Samuels <[email protected]>
To   [email protected]
Subject   st: RE: do file: t-score, dfuller, to sw regress
Date   Thu, 9 Dec 2010 22:12:40 -0500


Here are just a few references, containing others, culled from a quick Google search for "stepwise selection problems bootstrap". If I recall, Gail Gong studied a strategy very much like yours, although for logistic regression. Frank Harrell's book "Regression Modeling Strategies" is a good resource for alternative strategies.

Steve


B Efron and G Gong (1983) A leisurely look at the boostrap, the jackknife, and cross-validation. Am Stat 37, 36-48

Gail Gong, 1986, Cross--validation, the jackknife, and the boostrap, Excess error in forward logistic regression, JASA 81, 108-113.
		
Peter C. Austina, Jack V. Tua Automated variable selection methods for logistic regression produced unstable models for predicting acute myocardial infarction mortality Journal of Clinical Epidemiology 57 (2004) 1138?1146 http://uncwddas.googlecode.com/files/article2.pdf

Derksen S. and Keselman, H. J. ?Backward, forward and stepwise automated subset selection algorithms: Frequency of obtaining authentic and noise variables?, British Journal of Mathematical and Statistical Psychology, 45, 265-282 (1992).

Frank E. Harrell Jr., Kerry L. Lee And Daniel B. Mark . Tutorial In Biostatistics. Multivariable Prognostic Models: Issues In Developing Models, Evaluating Assumptions And Adequacy, And Measuring And Reducing Errors. Statistics In Medicine, Vol. 15,361-387 (1996) http://www.unt.edu/rss/class/Jon/MiscDocs/Harrell_1996.pdf


On Dec 9, 2010, at 3:13 PM, steven quattry wrote:

Thank you Nick for your comments, and apologies to all for being
unclear.  I fully understand if this leads many to ignore my original
post.  However if I may re-attempt to explain, essentially I have a
do-file created with the help of Statlist contributors that performs
bi-variate regressions, sorts the  independent  variables by t-score
and removes those below a certain threshold.  It then runs a Dfuller
test and further removes variables that do not pass the critical
level, and finally there is code that essentially removes any
variables that have blanks.  I would like to be able to learn of a way
to then take this output and sort the resulting variables by t-score,
then keep only the 72 variables with the highest t-score, and run a sw
regress with those variables.  My current code is below.  Again, I
sincerely apologize for being unclear and would appreciate any
feedback but understand if I do not receive any.

Also Nick, I assume you do not have the time to go into the
spuriousness of the above process, but if you were able to direct me
to a certain chapter in a well known stats text, or even an online
resource I would be quite thankful, however I fully understand it is
not your role.

Thank you for your consideration,
-Steven


I am using Stata/SE 11.1 for Windows

* 2.1 T-test and Dickey-Fuller Filter
**************************************

   drop if n<61

   tsset n
	tempname memhold
   tempname memhold2
   postfile `memhold' str20 var  double t using t_score, replace
postfile `memhold2' str20 var2 double df_pvalue using df_pvalue, replace

   foreach var of varlist swap1m-allocglobal uslib1m-infdify
dswap1m-dallocglobal6 {
       qui reg dhealth `var'
       matrix e =e(b)
       matrix v = e(V)
       local t = abs(e[1,1]/sqrt(v[1,1]))
		if `t' < 1.7 {
			drop `var'
		}
		else {
			local mylist "`mylist' `var'"
			post `memhold' ("`var'") (`t')
		}
   }
   postclose `memhold'

   foreach l of local mylist {
	   qui dfuller `l', lag(1)
	   if r(p) > .01 {
	       drop `l'
	   }
	   else {
	       local mylist2 "`mylist2' `l'"
	       post `memhold2' ("`l'") (r(p))
	   }
   }
   postclose `memhold2'
   keep `mylist2'
log on
   use t_score,clear
   gsort -t
   l
   use df_pvalue, clear
   l
log off
restore

* 2.2 Missing data Filter
**************************
preserve
   drop if n<61

   foreach x of varlist `mylist2' {
       qui sum `x'
           if r(N)<72 {
               di in red "`x'"
               drop `x'
           }
           else {
               local myvar "`myvar' `x'"
           }
   }

   sum date
   keep if date==r(max)

   foreach x of varlist `myvar' {
       if `x'==. {
           drop `x'
       }
       else {
           local myvar2 "`myvar2' `x'"
       }
   }
log on
d `myvar2'
log off
restore


* 2.3 Stepwise Regressions
***************************

preserve
   drop if n<61

*Simultaneous Model
   local x "Here is where I paste in variables after sorting by
t-score and keeping only 72 highest"


log on
   sw reg dhealth `x', pe(0.05)


vif
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index