Statalist


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: st: multicollinearity


From   jverkuilen <[email protected]>
To   <[email protected]>
Subject   RE: st: multicollinearity
Date   Wed, 19 Nov 2008 17:03:11 -0500

Stata drops perfectly multicollinear variables. It won't drop variables that aren't perfectly collinear. 

There are many silly examples one could make, e.g., a variable x and another -x both included in the regression, but if you want a slightly less obvious one take a trichtomous variable and make three dummies from it. One is redundant; it doesn't matter which. 

Back in the old days multicollinearity was a big numerical problem because many cheap computing algorithms are inherently ill-conditioned and thus more unstable in the presence of collinarity than might otherwise be the case. It still is, but now that much better algorithms such as QR decompostion are used, the effect on estimates is mitigated. 

The substantive problem with multicollinearity is that you can't untangle the effect of collinear variables from each other.  

-----Original Message-----
From: "Chris Witte" <[email protected]>
To: [email protected]
Sent: 11/19/2008 3:06 PM
Subject: st: multicollinearity

Is there another way to get the following module (the link isn't working for me)?

Example .  Stata learning module on regression diagnostics:  Multicollinearity
        . . . . . . . . . . . . . . . . . .  UCLA Academic Technology Services
        12/03   http://www.ats.ucla.edu/stat/stata/modules/reg/multico.htm


Also, I have read that -anova- and -regress- will drop variables that have collinearity problems, but I have never had Stata drop variables on me.  For example:

sysuse auto
reg price headroom trunk weight length turn displacement gear_ratio

      Source |       SS       df       MS              Number of obs =      74
-------------+------------------------------           F(  7,    66) =    8.18
       Model |   295089440     7  42155634.3           Prob > F      =  0.0000
    Residual |   339975956    66  5151150.85           R-squared     =  0.4647
-------------+------------------------------           Adj R-squared =  0.4079
       Total |   635065396    73  8699525.97           Root MSE      =  2269.6
------------------------------------------------------------------------------
       price |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
    headroom |  -788.1489   423.1895    -1.86   0.067    -1633.074    56.77608
       trunk |   109.2235   103.9332     1.05   0.297    -98.28582    316.7328
      weight |   5.300069   1.331056     3.98   0.000     2.642531    7.957607
      length |  -73.59571   42.42778    -1.73   0.087    -158.3055    11.11408
        turn |  -301.2525   124.9576    -2.41   0.019    -550.7384   -51.76669
displacement |    11.4282   7.622549     1.50   0.139    -3.790711    26.64711
  gear_ratio |   2236.615   1051.394     2.13   0.037     137.4391    4335.791
       _cons |   7795.908   6103.469     1.28   0.206    -4390.061    19981.88
------------------------------------------------------------------------------


and the correlation between weight and length is 0.9460.  Why aren't one of these variables dropped?  Does there have to be perfect correlation for dropping variables?



      

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2024 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index