[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

# RE: st: multicollinearity

 From jverkuilen To Subject RE: st: multicollinearity Date Wed, 19 Nov 2008 22:29:58 -0500

```Variables with correlation on 0.95 may be perfectly reasonable in some problems. Stata is *absolutely correct* to leave variable selection in those situations to you. Highly collinear predictors can be diagnosed in various ways, e.g., -collin-.

I tell my students whenever they see Stata kicking out a variable due to collinearity or perfect prediction, it is THEIR job to figure out why. Chances are good they are not fitting the model they thought they were fitting. Even if the model Stata chooses is statistically equivalent to the one they wanted, surely they have information that would lead them to pick a good reference variable?

-----Original Message-----
From: "Chris Witte" <eljefespeaks@yahoo.com>
To: statalist@hsphsun2.harvard.edu
Sent: 11/19/2008 8:11 PM
Subject: st: multicollinearity

I have read that -anova- and -regress- will drop variables that have collinearity problems, but I have never had Stata drop variables on me.  For example:

sysuse auto
reg price headroom trunk weight length turn displacement gear_ratio

Source |       SS       df       MS              Number of obs =      74
-------------+------------------------------           F(  7,    66) =    8.18
Model |   295089440     7  42155634.3           Prob > F      =  0.0000
Residual |   339975956    66  5151150.85           R-squared     =  0.4647
-------------+------------------------------           Adj R-squared =  0.4079
Total |   635065396    73  8699525.97           Root MSE      =  2269.6
------------------------------------------------------------------------------
price |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
headroom |  -788.1489   423.1895    -1.86   0.067    -1633.074    56.77608
trunk |   109.2235   103.9332     1.05   0.297    -98.28582    316.7328
weight |   5.300069   1.331056     3.98   0.000     2.642531    7.957607
length |  -73.59571   42.42778    -1.73   0.087    -158.3055    11.11408
turn |  -301.2525   124.9576    -2.41   0.019    -550.7384   -51.76669
displacement |    11.4282   7.622549     1.50   0.139    -3.790711    26.64711
gear_ratio |   2236.615   1051.394     2.13   0.037     137.4391    4335.791
_cons |   7795.908   6103.469     1.28   0.206    -4390.061    19981.88
------------------------------------------------------------------------------

and the correlation between weight and length is 0.9460.  Why aren't one of these variables dropped?  Does there have to be perfect correlation before dropping variables occurs?

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
```

 © Copyright 1996–2019 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index