# RE: st: multicollinearity

 From jverkuilen To Subject RE: st: multicollinearity Date Wed, 19 Nov 2008 17:03:11 -0500

```Stata drops perfectly multicollinear variables. It won't drop variables that aren't perfectly collinear.

There are many silly examples one could make, e.g., a variable x and another -x both included in the regression, but if you want a slightly less obvious one take a trichtomous variable and make three dummies from it. One is redundant; it doesn't matter which.

Back in the old days multicollinearity was a big numerical problem because many cheap computing algorithms are inherently ill-conditioned and thus more unstable in the presence of collinarity than might otherwise be the case. It still is, but now that much better algorithms such as QR decompostion are used, the effect on estimates is mitigated.

The substantive problem with multicollinearity is that you can't untangle the effect of collinear variables from each other.

-----Original Message-----
From: "Chris Witte" <eljefespeaks@yahoo.com>
To: statalist@hsphsun2.harvard.edu
Sent: 11/19/2008 3:06 PM
Subject: st: multicollinearity

Is there another way to get the following module (the link isn't working for me)?

Example .  Stata learning module on regression diagnostics:  Multicollinearity
. . . . . . . . . . . . . . . . . .  UCLA Academic Technology Services
12/03   http://www.ats.ucla.edu/stat/stata/modules/reg/multico.htm

Also, I have read that -anova- and -regress- will drop variables that have collinearity problems, but I have never had Stata drop variables on me.  For example:

sysuse auto
reg price headroom trunk weight length turn displacement gear_ratio

Source |       SS       df       MS              Number of obs =      74
-------------+------------------------------           F(  7,    66) =    8.18
Model |   295089440     7  42155634.3           Prob > F      =  0.0000
Residual |   339975956    66  5151150.85           R-squared     =  0.4647
Total |   635065396    73  8699525.97           Root MSE      =  2269.6
------------------------------------------------------------------------------
price |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
headroom |  -788.1489   423.1895    -1.86   0.067    -1633.074    56.77608
trunk |   109.2235   103.9332     1.05   0.297    -98.28582    316.7328
weight |   5.300069   1.331056     3.98   0.000     2.642531    7.957607
length |  -73.59571   42.42778    -1.73   0.087    -158.3055    11.11408
turn |  -301.2525   124.9576    -2.41   0.019    -550.7384   -51.76669
displacement |    11.4282   7.622549     1.50   0.139    -3.790711    26.64711
gear_ratio |   2236.615   1051.394     2.13   0.037     137.4391    4335.791
_cons |   7795.908   6103.469     1.28   0.206    -4390.061    19981.88
------------------------------------------------------------------------------

and the correlation between weight and length is 0.9460.  Why aren't one of these variables dropped?  Does there have to be perfect correlation for dropping variables?

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
```