Statalist


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: st: multicollinearity


From   jverkuilen <jverkuilen@gc.cuny.edu>
To   <statalist@hsphsun2.harvard.edu>
Subject   RE: st: multicollinearity
Date   Wed, 19 Nov 2008 22:29:58 -0500

Variables with correlation on 0.95 may be perfectly reasonable in some problems. Stata is *absolutely correct* to leave variable selection in those situations to you. Highly collinear predictors can be diagnosed in various ways, e.g., -collin-. 

I tell my students whenever they see Stata kicking out a variable due to collinearity or perfect prediction, it is THEIR job to figure out why. Chances are good they are not fitting the model they thought they were fitting. Even if the model Stata chooses is statistically equivalent to the one they wanted, surely they have information that would lead them to pick a good reference variable?  



-----Original Message-----
From: "Chris Witte" <eljefespeaks@yahoo.com>
To: statalist@hsphsun2.harvard.edu
Sent: 11/19/2008 8:11 PM
Subject: st: multicollinearity

I have read that -anova- and -regress- will drop variables that have collinearity problems, but I have never had Stata drop variables on me.  For example:

sysuse auto
reg price headroom trunk weight length turn displacement gear_ratio

      Source |       SS       df       MS              Number of obs =      74
-------------+------------------------------           F(  7,    66) =    8.18
       Model |   295089440     7  42155634.3           Prob > F      =  0.0000
    Residual |   339975956    66  5151150.85           R-squared     =  0.4647
-------------+------------------------------           Adj R-squared =  0.4079
       Total |   635065396    73  8699525.97           Root MSE      =  2269.6
------------------------------------------------------------------------------
       price |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
    headroom |  -788.1489   423.1895    -1.86   0.067    -1633.074    56.77608
       trunk |   109.2235   103.9332     1.05   0.297    -98.28582    316.7328
      weight |   5.300069   1.331056     3.98   0.000     2.642531    7.957607
      length |  -73.59571   42.42778    -1.73   0.087    -158.3055    11.11408
        turn |  -301.2525   124.9576    -2.41   0.019    -550.7384   -51.76669
displacement |    11.4282   7.622549     1.50   0.139    -3.790711    26.64711
  gear_ratio |   2236.615   1051.394     2.13   0.037     137.4391    4335.791
       _cons |   7795.908   6103.469     1.28   0.206    -4390.061    19981.88
------------------------------------------------------------------------------


and the correlation between weight and length is 0.9460.  Why aren't one of these variables dropped?  Does there have to be perfect correlation before dropping variables occurs?


      

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index