Stata The Stata listserver
[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

st: -allpossible- available on SSC


From   "Nick Cox" <n.j.cox@durham.ac.uk>
To   <statalist@hsphsun2.harvard.edu>
Subject   st: -allpossible- available on SSC
Date   Tue, 1 Oct 2002 19:30:36 +0100

Thanks to Kit Baum, an -allpossible- module 
is now available on SSC. 

Setting aside inappropriate and hubristic 
overtones of omnicompetence, the creation of
this beast was driven by a particular need. 
An outside critic of a current project 
urged the merits of trying all possible 
subsets of predictors, given a response variable measured on 
the ground and reflectances measured 
by satellite in several spectral bands. 
We have many reservations about shotgun-assisted 
model building, but we need the evidence
before we can discuss. 

That aside, -allpossible- is best understood 
by example, but not our data which you don't have: 

. use auto, clear
(1978 Automobile Data)

. gen gpm = 1 / mpg 

. allpossible reg gpm head-displ, s(r2_a rmse) 

----------------------------------------------------
    model |   predictors          r2_a          rmse
----------+-----------------------------------------
        1 |       (none)         0.000         0.013
        2 |            1         0.170         0.012
        3 |            2         0.392         0.010
        4 |            3         0.726         0.007
        5 |            4         0.667         0.007
        6 |            5         0.561         0.008
        7 |            6         0.589         0.008
        8 |          1 2         0.383         0.010
        9 |          1 3         0.723         0.007
       10 |          1 4         0.663         0.007
       11 |          1 5         0.569         0.008
       12 |          1 6         0.588         0.008
       13 |          2 3         0.729         0.007
       14 |          2 4         0.666         0.007
       15 |          2 5         0.607         0.008
       16 |          2 6         0.627         0.008
       17 |          3 4         0.724         0.007
       18 |          3 5         0.724         0.007
       19 |          3 6         0.723         0.007
       20 |          4 5         0.671         0.007
       21 |          4 6         0.688         0.007
       22 |          5 6         0.645         0.008
       23 |        1 2 3         0.726         0.007
       24 |        1 2 4         0.661         0.007
       25 |        1 2 5         0.602         0.008
       26 |        1 2 6         0.624         0.008
       27 |        1 3 4         0.720         0.007
       28 |        1 3 5         0.720         0.007
       29 |        1 3 6         0.719         0.007
       30 |        1 4 5         0.666         0.007
       31 |        1 4 6         0.684         0.007
       32 |        1 5 6         0.642         0.008
       33 |        2 3 4         0.725         0.007
       34 |        2 3 5         0.726         0.007
       35 |        2 3 6         0.725         0.007
       36 |        2 4 5         0.670         0.007
       37 |        2 4 6         0.687         0.007
       38 |        2 5 6         0.663         0.007
       39 |        3 4 5         0.721         0.007
       40 |        3 4 6         0.720         0.007
       41 |        3 5 6         0.720         0.007
       42 |        4 5 6         0.687         0.007
       43 |      1 2 3 4         0.722         0.007
       44 |      1 2 3 5         0.723         0.007
       45 |      1 2 3 6         0.722         0.007
       46 |      1 2 4 5         0.666         0.007
       47 |      1 2 4 6         0.684         0.007
       48 |      1 2 5 6         0.659         0.007
       49 |      1 3 4 5         0.717         0.007
       50 |      1 3 4 6         0.716         0.007
       51 |      1 3 5 6         0.716         0.007
       52 |      1 4 5 6         0.683         0.007
       53 |      2 3 4 5         0.722         0.007
       54 |      2 3 4 6         0.721         0.007
       55 |      2 3 5 6         0.722         0.007
       56 |      2 4 5 6         0.686         0.007
       57 |      3 4 5 6         0.717         0.007
       58 |    1 2 3 4 5         0.719         0.007
       59 |    1 2 3 4 6         0.718         0.007
       60 |    1 2 3 5 6         0.719         0.007
       61 |    1 2 4 5 6         0.683         0.007
       62 |    1 3 4 5 6         0.713         0.007
       63 |    2 3 4 5 6         0.718         0.007
       64 |  1 2 3 4 5 6         0.715         0.007
----------------------------------------------------

  1      headroom
  2      trunk
  3      weight
  4      length
  5      turn
  6      displacement

More generally, -allpossible- by default (1) computes all 
possible models fitted by a model command to a response 
and subsets of up to 6 predictors and (2) tabulates a list 
of statistics for each model fitted.
Alternatively, (1') the maximum number of predictors fitted may be
specified as a number less than 6. The model command must be a
command fitting a model to a single response variable.
In the example above, it is -regress-; in our project, 
it is -glm-. 

The list of statistics must include one or more names of
e-class results, as would be displayed by -estimates list-
after fitting an individual model. 

Naturally, this command does not purport
to replace the detailed scrutiny of individual models or to offer
an unproblematic way of finding "best" models. Its main use may lie
in demonstrating that several models exist within many projects
possessing roughly equal merit as measured by omnibus statistics.
In fact, I can see this featuring in my own teaching 
together with suitable homilies and injunctions. 

The magic number 6 does not reflect any principle; it is 
as far as I got given that we have 6 spectral bands in 
our specific satellite data. Having been brought up 
on the idea that with seven parameters you can fit 
an elephant, I have some inhibitions about going 
further. In any case, looking at all 2^7 = 128 
fits with 7 predictors creates a longer table 
than might be wished. Let me stress that 
the restriction of 6 is to how many predictors 
are included in any one model; you can 
specify more candidate predictors if you like, 
so long as the total number of models fitted
does not exceed the number of observations. 

Stata 7 required.

In searching for earlier work in this direction, 
I was able to draw upon ideas in the -rsquare- 
program of Philip Ender and Rie von Eyben of UCLA, 
which has different but overlapping aims. It
saved me a lot of time. Phil tells me there is
something similar in SAS. 

Nick
n.j.cox@durham.ac.uk 


*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index