[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

From |
Nick Cox <n.j.cox@stata.com> |

To |
statalist@hsphsun2.harvard.edu, icanette@stata.com |

Subject |
Re: st: Find all subsets of variables |

Date |
Thu, 25 Sep 2008 10:38:56 -0500 |

I agree with Alan and also with Tony, and disagree with Scott.

How is that possible, when Scott suppports Tony?

Others in this thread have kindly recommended my -allpossible- and my -selectvars- from SSC. No one recommended my -tuples- from SSC as a lower-level tool, which I prefer, not that these programs tackle precisely the same problem.

The original stimulus for -allpossible- was the thesis problem of a Ph.D. student of mine, who was looking at the predictability of a ground-measured response from 6 LANDSAT spectral bands. Neighbouring bands not surprisingly are often highly correlated, and exploring the question thoroughly could be done by looking at all 2^6 - 1 subsets of predictors. That is 63, and manageable with the right tools.

This limit of 6 predictors in my problem explains the limit built in to -allpossible-, a program written for one project only.

In making the program public as something others might find useful too, I was very queasy given (1) the combinatorial explosion of possibilities and (2) the predilection of many to hope or believe that the best model can or should be found automatically. Although I dislike stepwise modelling for all the standard reasons, it seems to me that looking at all the possible models can be a reasonable thing to do in some problems.

The help file for -allpossible- carries this "Warning: This hot drink is hot" caveat:

"Naturally, this command does not purport to replace the detailed scrutiny of individual models or to offer an unproblematic way of finding "best" models. Its main use may lie in demonstrating that several models exist within many projects possessing roughly equal merit as measured by omnibus statistics."

When others asked similar questions I revisited the issue with -selectvars- and -tuples-.

Nick

n.j.cox@durham.ac.uk

Feiveson, Alan H.

One situation where you might want to consider all subsets (possibly of

a given size) is where you are trying to approximate a deterministic

function with as few terms as is "reasonable". In this case, there is no

"true" model or statistical inference to be made. For example, I may

have a table of values of predictors and a function of these predictors

obtained by some proprietary software and I am just trying to find a

cheap approximation to the function using a linear combination of a

small number of the predictors (or transformations of the predictors).

SR Millis

I agree. I can't imagine why anyone would want to use all-subsets. Bayesian model averaging may be another alternative worth considering.

Lachenbruch, Peter

I think the same problem

exists - you get a billion line output (with 50 vars and subset size of 10). I think SAS had something like this, but displayed only the 'best' one.

This suggests to me a) know a lot about your data before doing this; b) look for small subsets; or c) use some sort of stepwise (and penalized) procedure (AIC or BIC or Mallows' Cp).

We're talking the art of statistical analysis now.

junin

i want to find out all subsets of a given set ofvariables for modeltesting. As an example: A set of variables var1 var2 var3 var4 should give me: var1 var2 var3 var4 var1 var2 var3 var1 var2 var4 var1 var3 var4 var1 var4 var1 var2 var3 var4 and so forth. I would like to test all possible modelconfigurations. Is there acommand in Stata, which could be convenient to use?

* * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

**References**:**RE: st: Find all subsets of variables***From:*"Lachenbruch, Peter" <Peter.Lachenbruch@oregonstate.edu>

**RE: st: Find all subsets of variables***From:*SR Millis <srmillis@yahoo.com>

**RE: st: Find all subsets of variables***From:*"Feiveson, Alan H. (JSC-SK311)" <alan.h.feiveson@nasa.gov>

- Prev by Date:
**st: Matching methods with TSCS data?** - Next by Date:
**st: Estimating the probability of censoring** - Previous by thread:
**RE: st: Find all subsets of variables** - Next by thread:
**st: Fixed effects after xtprobit** - Index(es):

© Copyright 1996–2014 StataCorp LP | Terms of use | Privacy | Contact us | What's new | Site index |