Stata: Data Analysis and Statistical Software

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

st: RE: do file: t-score, dfuller, to sw regress

From	Steven Samuels <[email protected]>
To	[email protected]
Subject	st: RE: do file: t-score, dfuller, to sw regress
Date	Thu, 9 Dec 2010 22:12:40 -0500

Here are just a few references, containing others, culled from a quickGoogle search for "stepwise selection problems bootstrap". If Irecall, Gail Gong studied a strategy very much like yours, althoughfor logistic regression. Frank Harrell's book "Regression ModelingStrategies" is a good resource for alternative strategies.


Steve

B Efron and G Gong (1983) A leisurely look at the boostrap, thejackknife, and cross-validation. Am Stat 37, 36-48

Gail Gong, 1986, Cross--validation, the jackknife, and the boostrap,Excess error in forward logistic regression, JASA 81, 108-113.

Peter C. Austina, Jack V. Tua Automated variable selection methods forlogistic regression produced unstable models for predicting acutemyocardial infarction mortality Journal of Clinical Epidemiology 57(2004) 1138?1146 http://uncwddas.googlecode.com/files/article2.pdf

Derksen S. and Keselman, H. J. ?Backward, forward and stepwiseautomated subset selection algorithms: Frequency of obtainingauthentic and noise variables?, British Journal of Mathematical andStatistical Psychology, 45, 265-282 (1992).

Frank E. Harrell Jr., Kerry L. Lee And Daniel B. Mark . Tutorial InBiostatistics. Multivariable Prognostic Models: Issues In DevelopingModels, Evaluating Assumptions And Adequacy, And Measuring AndReducing Errors. Statistics In Medicine, Vol. 15,361-387 (1996) http://www.unt.edu/rss/class/Jon/MiscDocs/Harrell_1996.pdf



On Dec 9, 2010, at 3:13 PM, steven quattry wrote:

Thank you Nick for your comments, and apologies to all for being
unclear.  I fully understand if this leads many to ignore my original
post.  However if I may re-attempt to explain, essentially I have a
do-file created with the help of Statlist contributors that performs
bi-variate regressions, sorts the  independent  variables by t-score
and removes those below a certain threshold.  It then runs a Dfuller
test and further removes variables that do not pass the critical
level, and finally there is code that essentially removes any
variables that have blanks.  I would like to be able to learn of a way
to then take this output and sort the resulting variables by t-score,
then keep only the 72 variables with the highest t-score, and run a sw
regress with those variables.  My current code is below.  Again, I
sincerely apologize for being unclear and would appreciate any
feedback but understand if I do not receive any.

Also Nick, I assume you do not have the time to go into the
spuriousness of the above process, but if you were able to direct me
to a certain chapter in a well known stats text, or even an online
resource I would be quite thankful, however I fully understand it is
not your role.

Thank you for your consideration,
-Steven


I am using Stata/SE 11.1 for Windows

* 2.1 T-test and Dickey-Fuller Filter
**************************************

   drop if n<61

   tsset n
	tempname memhold
   tempname memhold2
   postfile `memhold' str20 var  double t using t_score, replace

postfile `memhold2' str20 var2 double df_pvalue using df_pvalue,replace


   foreach var of varlist swap1m-allocglobal uslib1m-infdify
dswap1m-dallocglobal6 {
       qui reg dhealth `var'
       matrix e =e(b)
       matrix v = e(V)
       local t = abs(e[1,1]/sqrt(v[1,1]))
		if `t' < 1.7 {
			drop `var'
		}
		else {
			local mylist "`mylist' `var'"
			post `memhold' ("`var'") (`t')
		}
   }
   postclose `memhold'

   foreach l of local mylist {
	   qui dfuller `l', lag(1)
	   if r(p) > .01 {
	       drop `l'
	   }
	   else {
	       local mylist2 "`mylist2' `l'"
	       post `memhold2' ("`l'") (r(p))
	   }
   }
   postclose `memhold2'
   keep `mylist2'
log on
   use t_score,clear
   gsort -t
   l
   use df_pvalue, clear
   l
log off
restore

* 2.2 Missing data Filter
**************************
preserve
   drop if n<61

   foreach x of varlist `mylist2' {
       qui sum `x'
           if r(N)<72 {
               di in red "`x'"
               drop `x'
           }
           else {
               local myvar "`myvar' `x'"
           }
   }

   sum date
   keep if date==r(max)

   foreach x of varlist `myvar' {
       if `x'==. {
           drop `x'
       }
       else {
           local myvar2 "`myvar2' `x'"
       }
   }
log on
d `myvar2'
log off
restore


* 2.3 Stepwise Regressions
***************************

preserve
   drop if n<61

*Simultaneous Model
   local x "Here is where I paste in variables after sorting by
t-score and keeping only 72 highest"


log on
   sw reg dhealth `x', pe(0.05)


vif
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

Follow-Ups:
- Re: st: RE: do file: t-score, dfuller, to sw regress
  - From: Steven Samuels <[email protected]>

References:
- st: do file: t-score, dfuller, to sw regress
  - From: steven quattry <[email protected]>
- st: RE: RE: do file: t-score, dfuller, to sw regress
  - From: steven quattry <[email protected]>

Prev by Date: Re: st: How to caculate the 90th percentile and 50th percentile of the wage distribution in my data set using stata?
Next by Date: Re: st: RE: do file: t-score, dfuller, to sw regress
Previous by thread: st: RE: RE: do file: t-score, dfuller, to sw regress
Next by thread: Re: st: RE: do file: t-score, dfuller, to sw regress
Index(es):
- Date
- Thread