Nick Cox <njcoxstata@gmail.com>

Re: st: tuples, stepwise and counting types of variables

Tue, 14 Aug 2012 10:24:47 +0100

I've been posting against stepwise on this list for about a decade, I guess. For once I can add to Cameron's sample from his countably infinite list of references with Harrell, F. 2001. Regression modeling strategies. New York: Springer. There are plenty of very technical objections to stepwise, e.g. what the heck it means inferentially, but my main reservations are more pragmatic. 1. If you use stepwise, how can you honestly report it? Fortunately, there are still fields in which "I used stepwise regression to choose predictors" is regarded as a smokescreen for "I declined the obligation to think about which predictors make most scientific sense in this problem". For people not working in science, substitute "econometric" or some other suitable adjective. 2. It is sad in some ways, and very droll in others, that some people run scared away from situations in which they might be accused of subjective or arbitrary decisions. Not making scientific judgments but making a judgment about precisely which forward or backward selection rule to use still looks pretty arbitrary to me. I don't have an easy answer for anyone who must run lots of regressions, is obliged therein to choose a subset of predictors, and doesn't have time to think about each regression carefully. I wouldn't willingly choose such a problem. These problems were around long before anyone started talking about "data mining". Nick On Tue, Aug 14, 2012 at 2:56 AM, Cameron McIntosh <cnm100@hotmail.com> wrote: > Nice to see that Nick is an active member of the "anti-stepwise regression club." In that regard, I might strongly suggest taking a look at: > > Flom, P.L., & Cassell, D.L. (2007). Stopping stepwise: Why stepwise and similar selection methods are bad, and what you should use. NESUG 2007: Statistics and Data Analysis. > http://www.nesug.org/proceedings/nesug07/sa/sa07.pdf > > Huberty, C. J. (1989). Problems with stepwise methods—Better alternatives. In B. Thompson (Ed.), Advances in social science methodology (Vol. 1, pp. 43–70). Greenwich, CT: JAI Press. > http://education.gsu.edu/coshima/EPRS8550/Oshima%20Problem.pdf > > Thompson, B. (2001). Significance, Effect Sizes, Stepwise Methods, and Other Issues: Strong Arguments Move the Field. The Journal of Experimental Education, 70(1), 80-93. > http://web.me.com/rsbalkin/Site/Research_Methods_and_Statistics_files/Strong%20arguments%20move%20the%20field--Thompson.pdf > > Thompson, B. (1995). Stepwise Regression and Stepwise Discriminant Analysis Need Not Apply here: A Guidelines Editorial. Educational and Psychological Measurement, 55(4), 525-534. > > Thompson, B. (1989). Why won't stepwise methods die? Measurement and Evaluation in Counseling and Development, 21(4), 146-148. > http://web.me.com/rsbalkin/Site/Research_Methods_and_Statistics_files/why%20won't%20stepwise%20methods%20die.pdf > > Some additional references are in the FAQ Nick mentioned. To be sure, I'm not against data mining in general. > > Cam > >> Date: Mon, 13 Aug 2012 21:33:01 -0400 >> Subject: Re: st: tuples, stepwise and counting types of variables >> From: sohnesen@gmail.com >> To: statalist@hsphsun2.harvard.edu >> >> Thanks Nick >> >> My question is how do i generate the "used" list after using stepwise >> regression? Stepwise (or another automated variable selection method) >> decides which variables stay in the model. I've counted the number of >> variables in e(df_m), but i believe i need to save the actual names of >> the variables that stay in the regression to use your suggested >> approach. >> >> thanks again >> Thomas >> On Mon, Aug 13, 2012 at 8:36 PM, Nick Cox <njcoxstata@gmail.com> wrote: >>> I can't comment on analogues to MAXR as I am not familiar with SAS. >>> >>> For counting how many of a list are in another list, you can find the >>> intersection of two lists using >>> >>> : list a & b >>> >>> as documented at -help macrolists-. and then count them. >>> >>> For example, >>> >>> local availablex "x1 x2 x3" >>> local usedx "x2" >>> local inter : list availablex & usedx >>> di `: word count `inter' >>> >>> Nick >>> >>> On Tue, Aug 14, 2012 at 1:24 AM, Thomas Sohnesen <sohnesen@gmail.com> wrote: >>>> Thanks Nick >>>> >>>> For this exercise i'm not interested in the coeffiicents or their >>>> meaning, i'm looking to find a parsimonouce model for predictions. >>>> Any advice on a better alternative than stepwise? Doing it manually >>>> is not really an option as we will be running a lot of different >>>> models. Further, though my data is organized in blocks i would like to >>>> keep single variables if they are highly correlated with my dependent >>>> variable. I believe SAS has an alernative in MAXR. Do you know if >>>> stata has a similar alternativ? >>>> >>>> Finally, no matter which alternativ we end up using, i still have the >>>> challange of counting number of variables from each block in the final >>>> model. Any insights on that? >>>> >>>> thanks and best >>>> >>>> Thomas >>>> >>>> >>>> On Mon, Aug 13, 2012 at 5:30 PM, Nick Cox <njcoxstata@gmail.com> wrote: >>>>> I belong to a club which is dedicated to advising people against using >>>>> -stepwise-. A -search- will find an FAQ on this question. >>>>> >>>>> I'd look at -nestreg- instead. >>>>> >>>>> Nick >>>>> >>>>> On Mon, Aug 13, 2012 at 10:18 PM, Thomas Sohnesen <sohnesen@gmail.com> wrote: >>>>> >>>>>> I have a number of "groups" of variables as examplified below. >>>>>> >>>>>> >>>>>> local gr1 x1 x2 x3 x4 >>>>>> >>>>>> local gr2 x5 x6 x7 x8 >>>>>> >>>>>> local gr3 x9 x10 x11 x12 x13 x14 x15 >>>>>> >>>>>> local gr4 x16 x17 >>>>>> >>>>>> >>>>>> >>>>>> I run stepwise regressions for all the combinations of these groups >>>>>> using tuples. >>>>>> >>>>>> tuples "`gr1'" "`gr2'" "`gr3'" "`gr4'" , display >>>>>> >>>>>> forval i = 1/`ntuples' { >>>>>> >>>>>> qui stepwise, pr(0.05): regress y `tuple`i'' >>>>>> >>>>>> } >>>>>> >>>>>> >>>>>> >>>>>> Now i would like to count how many variables from each group that >>>>>> stayed in the step wise model. >>>>>> >>>>>> >>>>>> >>>>>> For instance in the stepwise regression of gr1 and gr2 (ei x1 x2 x3 >>>>>> x4 x5 x6 x7 x8) only x3 x4 x5 was included in the regression. 