Bookmark and Share

Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down at the end of May, and its replacement, statalist.org is already up and running.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: tuples, stepwise and counting types of variables


From   Nick Cox <njcoxstata@gmail.com>
To   statalist@hsphsun2.harvard.edu
Subject   Re: st: tuples, stepwise and counting types of variables
Date   Tue, 14 Aug 2012 10:24:47 +0100

I've been posting against stepwise on this list for about a decade, I
guess. For once I can add to Cameron's sample from his countably
infinite list of references with

Harrell, F. 2001. Regression modeling strategies. New York: Springer.

There are plenty of very technical objections to stepwise, e.g. what
the heck it means inferentially, but my main reservations are more
pragmatic.

1. If you use stepwise, how can you honestly report it? Fortunately,
there are still fields in which "I used stepwise regression to choose
predictors" is regarded as a smokescreen for "I declined the
obligation to think about which predictors make most scientific sense
in this problem". For people not working in science, substitute
"econometric" or some other suitable adjective.

2. It is sad in some ways, and very droll in others, that some people
run scared away from situations in which they might be accused of
subjective or arbitrary decisions. Not making scientific judgments but
making a judgment about precisely which forward or backward selection
rule to use still looks pretty arbitrary to me.

I don't have an easy answer for anyone who must run lots of
regressions, is obliged therein to choose a subset of predictors, and
doesn't have time to think about each regression carefully. I wouldn't
willingly choose such a problem.

These problems were around long before anyone started talking about
"data mining".

Nick

On Tue, Aug 14, 2012 at 2:56 AM, Cameron McIntosh <cnm100@hotmail.com> wrote:
> Nice to see that Nick is an active member of the "anti-stepwise regression club." In that regard, I might strongly suggest taking a look at:
>
> Flom, P.L., & Cassell, D.L. (2007). Stopping stepwise: Why stepwise and similar selection methods are bad, and what you should use. NESUG 2007: Statistics and Data Analysis.
> http://www.nesug.org/proceedings/nesug07/sa/sa07.pdf
>
> Huberty, C. J. (1989). Problems with stepwise methods—Better alternatives. In B. Thompson (Ed.), Advances in social science methodology (Vol. 1, pp. 43–70). Greenwich, CT: JAI Press.
> http://education.gsu.edu/coshima/EPRS8550/Oshima%20Problem.pdf
>
> Thompson, B. (2001). Significance, Effect Sizes, Stepwise Methods, and Other Issues: Strong Arguments Move the Field. The Journal of Experimental Education, 70(1), 80-93.
> http://web.me.com/rsbalkin/Site/Research_Methods_and_Statistics_files/Strong%20arguments%20move%20the%20field--Thompson.pdf
>
> Thompson, B. (1995). Stepwise Regression and Stepwise Discriminant Analysis Need Not Apply here: A Guidelines Editorial. Educational and Psychological Measurement, 55(4), 525-534.
>
> Thompson, B. (1989). Why won't stepwise methods die? Measurement and Evaluation in Counseling and Development, 21(4), 146-148.
> http://web.me.com/rsbalkin/Site/Research_Methods_and_Statistics_files/why%20won't%20stepwise%20methods%20die.pdf
>
> Some additional references are in the FAQ Nick mentioned. To be sure, I'm not against data mining in general.
>
> Cam
>
>> Date: Mon, 13 Aug 2012 21:33:01 -0400
>> Subject: Re: st: tuples, stepwise and counting types of variables
>> From: sohnesen@gmail.com
>> To: statalist@hsphsun2.harvard.edu
>>
>> Thanks Nick
>>
>> My question is how do i generate the "used" list after using stepwise
>> regression? Stepwise (or another automated variable selection method)
>> decides which variables stay in the model. I've counted the number of
>> variables in e(df_m), but i believe i need to save the actual names of
>> the variables that stay in the regression to use your suggested
>> approach.
>>
>> thanks again
>> Thomas
>> On Mon, Aug 13, 2012 at 8:36 PM, Nick Cox <njcoxstata@gmail.com> wrote:
>>> I can't comment on analogues to MAXR as I am not familiar with SAS.
>>>
>>> For counting how many of a list are in another list, you can find the
>>> intersection of two lists using
>>>
>>> : list a & b
>>>
>>> as documented at -help macrolists-. and then count them.
>>>
>>> For example,
>>>
>>> local availablex "x1 x2 x3"
>>> local usedx "x2"
>>> local inter : list availablex & usedx
>>> di `: word count `inter'
>>>
>>> Nick
>>>
>>> On Tue, Aug 14, 2012 at 1:24 AM, Thomas Sohnesen <sohnesen@gmail.com> wrote:
>>>> Thanks Nick
>>>>
>>>> For this exercise i'm not interested in the coeffiicents or their
>>>> meaning, i'm looking to find a parsimonouce model for predictions.
>>>> Any advice on a better alternative than stepwise? Doing it manually
>>>> is not really an option as we will be running a lot of different
>>>> models. Further, though my data is organized in blocks i would like to
>>>> keep single variables if they are highly correlated with my dependent
>>>> variable. I believe SAS has an alernative in MAXR. Do you know if
>>>> stata has a similar alternativ?
>>>>
>>>> Finally, no matter which alternativ we end up using, i still have the
>>>> challange of counting number of variables from each block in the final
>>>> model. Any insights on that?
>>>>
>>>> thanks and best
>>>>
>>>> Thomas
>>>>
>>>>
>>>> On Mon, Aug 13, 2012 at 5:30 PM, Nick Cox <njcoxstata@gmail.com> wrote:
>>>>> I belong to a club which is dedicated to advising people against using
>>>>> -stepwise-. A -search- will find an FAQ on this question.
>>>>>
>>>>> I'd look at -nestreg- instead.
>>>>>
>>>>> Nick
>>>>>
>>>>> On Mon, Aug 13, 2012 at 10:18 PM, Thomas Sohnesen <sohnesen@gmail.com> wrote:
>>>>>
>>>>>> I have a number of "groups" of variables as examplified below.
>>>>>>
>>>>>>
>>>>>> local gr1 x1 x2 x3 x4
>>>>>>
>>>>>> local gr2 x5 x6 x7 x8
>>>>>>
>>>>>> local gr3 x9 x10 x11 x12 x13 x14 x15
>>>>>>
>>>>>> local gr4 x16 x17
>>>>>>
>>>>>>
>>>>>>
>>>>>> I run stepwise regressions for all the combinations of these groups
>>>>>> using tuples.
>>>>>>
>>>>>> tuples "`gr1'" "`gr2'" "`gr3'" "`gr4'" , display
>>>>>>
>>>>>> forval i = 1/`ntuples' {
>>>>>>
>>>>>> qui stepwise, pr(0.05): regress y `tuple`i''
>>>>>>
>>>>>> }
>>>>>>
>>>>>>
>>>>>>
>>>>>> Now i would like to count how many variables from each group that
>>>>>> stayed in the step wise model.
>>>>>>
>>>>>>
>>>>>>
>>>>>> For instance in the stepwise regression of gr1 and gr2 (ei x1 x2 x3
>>>>>> x4 x5 x6 x7 x8) only x3 x4 x5 was included in the regression. I
>>>>>> would then like an output along the lines of:
>>>>>>
>>>>>> Model Num_var_gr1 num_var_gr2 num_var_gr3 num_var_gr4
>>>>>>
>>>>>> gr2 gr3 1 2 0
>>>>>> 0
>>>>>>
>>>>>> gr2 gr4
>>>>>>
>>>>>> gr1 gr2

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   Site index