Stata The Stata listserver
[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

RE: st: RE: All-possible-regressions procedure

From   "Nick Cox" <>
To   <>
Subject   RE: st: RE: All-possible-regressions procedure
Date   Sat, 20 Sep 2003 17:34:03 +0100
> Thanks for your response.  I'm looking for something
> analogous to the SAS
> command (I forget what it is exactly), which selects the
> "best" # (you
> specify the #) of models using 1 covariate, 2 covariates, etc.  The
> investigator then explores the resulting models.  It's not really an
> automatic procedure in the sense of forward or backward
> selection.  Is that what  -allpossible- does?

No. For a description of what -allpossible- does, install
and read the help, or type

. ssc type allpossible.hlp

Al Feiveson posted about his -tryem- command (sorry
I had forgotten about that) which sounds much closer
to this SAS command you mention.

> I know it's not feasible to give me a short course in
> multiple regression,
> but what is your basic philosophy when whittling down
> potential explanatory
> variables when doing an explanatory model (as opposed to a
> predictive model)?

I don't have a distinctive attitude here. Although the idea
of explanation is always a little elusive, in the kind
of applications (usually to environmental data) I am engaged
in, it is usually relatively clear which variables have strong links
to the underlying processes in nature and which variables are
at best contextual. As I prefer to choose models which
I or colleagues can link to scientific knowledge, I am
sceptical of all attempts to formalise selection by
one or more formal statistical criteria. At the same

But there can be all sorts of compromises. For example, in
ecological applications, say to lots and lots of islands,
area often features as a predictor. Area perhaps
doesn't have much of a direct role in environmental processes,
but it has all sorts of relevance to diversity of habitat,
etc., and it is much, much more easier to measure than
many of the variables one would like to have in its
place. In short, area often features as a proxy
or surrogate for other predictors.

I'm perhaps fortunate that I don't _have_ to try to
automate variable selection: e.g. to produce
predictions in real time or extremely rapidly.
Nor is prediction alone usually among my goals.


*   For searches and help try:

© Copyright 1996–2022 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index