From khigbee@stata.com To statalist@hsphsun2.harvard.edu Subject Re: st: About coliearity Date Tue, 04 May 2004 16:48:50 -0500

```Maoyong Fan <fan@berkeley.edu> asks:

> I am running some ordinary least regression now.  There are more
> than 50 explainary variables in my model.  Most of them are dummy
> variables.  The Stata will automatically drop the variables
> randomly if the variable cause colinearity.  However, I want to
> keep some important variables in my regression and drop only
> those dummy variables when the important variable and dummy
> variable cause colinearity.  How can I program to do this?

I hope that someone else will think up a clever solution
different from what I am about to suggest, and I hesitate to show
the following undocumented feature ... but I think it will solve

I only ask that you (and anyone else venturing along the path I
am about to show) be careful when using the feature.

First some background information.  Stata when determining and
correcting for collinearity (by default) tries to pick one of the
collinear variables to drop based on which drop will provide the
most stable numeric properties.  As Maoyong Fan points out, for
near ties in this criteria (not uncommon in this setting), the
choice may differ from one run to another.

Underneath the hood (or should I say bonnet) Stata is using the
sweep operator on the X'X matrix.  By default with -regress-
(etc.), Stata will (behind the scenes) reorder the rows and
columns of the matrix so that the most stable variables are
retained when it comes to a decision of collinearity (i.e., when
it encounters a near zero element on the diagonal during the
sweep).

There is an internal flag that turns off this behavior.  It is
used for -anova- and -manova-, because in those cases we want
higher order interaction indicators (often called dummies) to be
dropped before main effects and lower interactions.  We set up
the X'X with the constant first, the main effects next, then the
interactions with highest order interactions last.  We turn off
the flag, and then we sweep.  When done we turn back on the flag
for whatever command that might follow.

It sounds like this is similar to what you want to achieve.

If just before you call your estimation command you enter

. set debug on
. set tol r 0
. set debug off

then the reorder flag will be turned off.  Then you run your
estimation command placing the variables in importance order left
to right.  (The right most colinear vars will be dropped before
those further to the left.)

IMPORTANT ! -- turn the flag back on after you are done.

. set debug on
. set tol r 1
. set debug off

so that Stata will be back to normal behavior.  This is what I
meant when I asked that anyone doing this be careful.  Make sure
you get Stata back to its default behavior.

Ken Higbee    khigbee@stata.com
StataCorp     1-800-STATAPC

*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
```