Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: st: RE: selectvars and factor variables
Nick Cox <email@example.com>
Re: st: RE: selectvars and factor variables
Fri, 28 Jan 2011 10:01:47 +0000
Antonis Loumiotis asked about -selectvars- (SSC). My last reply was
more positive about the utility of my -tuples- (also SSC).
This thread reminded me of a bug report on -tuples- in my files and
still not attended to.
With as ever the help of Kit Baum:
1. I have now found and fixed the bug in -tuples-, and also in
-selectvars-, which used the same offending code. The bug would only
bite for problems larger than most people are likely to encounter, or
so I guess and hope.
2. The revised programs and help files are now downloadable from SSC.
3. Although -selectvars- I believe does what it claims, both its help
file and package description now flag that it is declared superseded
by -tuples-, which isn't restricted to handling varlists and produces
more manageable results.
Although combinatorics is a mathematical and computational science
with many applications, as far as I know interest in such programs in
Stata is focused on the same problem as that addressed by Antonis:
cycling through all possible subsets of predictors, given a
particular response and model form. (We're setting aside whether some
or all should be transformed, combined into interaction terms, lagged,
differenced, ... .)
My most substantial experience of such problems arose in an
application with just 6 predictors in which it was important to
underline that several possible models were about equally good. No
doubt at least some people want rather to identify the "best buy" or
optimal model in some sense. As is well known, the problem is of
explosive character. Given k predictors, there are 2^k possible
predictor subsets (you might want to subtract the null model with no
predictors). I always think concretely of 2^10 ~ 10^3 and 2^20 ~ 10^6.
With a bit of attention to how you process the results, it is often
manageable to try out a thousand models, although few analysts would
want to inspect a thousand sets of diagnostic plots, say. But a
million? However, this may seem conservative to some.
Of course, this is why we have wonderful alternatives such as stepwise
My bug was sensed by a user wanting 2^18 subsets. He got them, but
some were identical! That was a precision problem. Those who recall
that I have written about precision problems will note that I was
bitten by one myself. As they say, good judgment comes from
experience, and a lot of that comes from bad judgment.
> A more positive take is that -tuples- (SSC) can be instructed to take any list "as is", so should not be broken by factor variables.
> @ Nick : Your guess is right! That's exactly what I want to do after I
> use selectvars. I will take a look at your suggestions. Thanks a lot
> for your help!
> @ Maarten: I have looked at unab but I haven't figure yet a way of how
> i can use it to solve my problem. I will try again. Thanks.
>> I think that distinct issues are in danger of being confused here.
>> 1. In any recent Stata, you can apply -xi:- or any other procedure to create indicator (a.k.a. dummy) variables and then feed those names to -selectvars- (SSC) as part or whole of a varlist.
>> 2. Stata 11 (only!) introduced factor variables. It is not quite that -selectvars-, which was written for Stata 8, does not allow factor variables: the situation is rather that is ignorant of them and cannot make sense of them. (-xi:- did some of what is now possible in 11, but is in principle quite distinct from this functionality.)
>> Your question seems to mix elements of 1 and 2. As far as -selectvars- is concerned, sorry, but I have no impulse to update it. -tuples- (SSC) is nearer my idea of a tool that should be provided, but it's not smart about factor variables either. You're welcome to copy part or all of the code for either, subject to the usual courtesies.
>> The deeper question is why you want to do this. I guess it is because you want to generate lots of different models, the models differing on which predictors are offered. It seems to me that such a procedure needs to be especially smart where indicators are concerned, as often (but not always) indicators are best offered together.
>> I'd look at -nestreg- and
>> SJ-10-4 st0213 . . . . . . . . . . . Variable selection in linear regression
>> (help vselect if installed) . . . . . . . . C. Lindsey and S. Sheather
>> Q4/10 SJ 10(4):650--669
>> performs variable selection after a linear regression
>> which, if my guess is right, may help your purpose.
>> I would like to use the package selectvars written by Nick Cox but my
>> variable list contains factor variables and selectvars does not allow
>> factor variables. So I would like to substitute in my variable list
>> the factor variables with the dummy variables that "xi:" creates and
>> then run selectvars on my expanded variable list. How can I do that?
>> For example:
>> local myvarlist var1 var2 i.var3 // where var3 is categorical with
>> four categories
>> local myevarlist var1 var2 Ivar3_2-4
>> selectvars `myevarlist', min(2) max(2)
>> My question is how to create myevarlist from myvarlist?
* For searches and help try: