[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: multicollinearity

From   "Michael S. Hanson" <>
Subject   Re: st: multicollinearity
Date   Thu, 20 Nov 2008 00:11:14 -0500


On Nov 19, 2008, at 3:06 PM, Chris Witte wrote:
On Nov 19, 2008, at 5:01 PM, Chris Witte wrote:
On Nov 19, 2008, at 8:11 PM, Chris Witte wrote:

1) The Statalist FAQ strongly suggests not posting the same message multiple times. You may find it useful to review the FAQ at <http://>.

Is there another way to get the following module (the link isn't working for me)?

Example . Stata learning module on regression diagnostics: Multicollinearity . . . . . . . . . . . . . . . . . . UCLA Academic Technology Services 12/03 multico.htm

2) I suspect that page just doesn't exist anymore. Not too surprising: it is from December 2003 -- almost 5 years ago, which was also a few versions of Stata ago. If you poke around the UCLA ATS web site, you might find related materials. Also, Google is your friend (TM).

Also, I have read that -anova- and -regress- will drop variables that have collinearity problems, but I have never had Stata drop variables on me. For example:

sysuse auto
reg price headroom trunk weight length turn displacement gear_ratio


and the correlation between weight and length is 0.9460. Why aren't one of these variables dropped? Does there have to be perfect correlation for dropping variables?

3) In a nutshell, yes. Multicollinearity and perfect collinearity are not the same thing. Indeed, they are conceptually rather different. (Your (sub)discipline may use slightly different terms for these two concepts.) Kennedy's "A Guide to Econometrics" (for example, the 5th edition, MIT Press, 2003) dedicates a whole chapter to multicollinearity, and has a decent discussion of this distinction. The explanation I have often given to my students is that multicollinearity is a sample problem -- which in many cases could conceptually be avoided by collecting more or "better" data -- whereas perfect collinearity is a model or specification problem -- in which no amount of additional data will resolve your specification error. Mathematically, with perfect collinearity, the (X'X) matrix is rank deficient and therefore not invertible: the OLS estimator simply does not exist in this case. Stata thus drops each collinear variable until (X'X) is of full rank, and the regression then can be estimated on the remaining variables. Other members of Statalist suggested to you a few synthetic examples in earlier replies. Multicollinearity inflates variances, thereby complicating inference, but it does not preclude estimation.

In Wooldridge's "Introductory Econometrics" textbook (for example, pp. 102-4 of the 3rd edition, Thomson South-Western, 2006) there is a very informative discussion of multicollinearity, which contains the following useful insight:

"Worrying about high degrees of correlation among the independent variables in the sample is really no different from worrying about a small sample size: both work to increase [the variance of beta hat]. The famous University of Wisconsin econometrician Arthur Goldberger, reacting to econometricians' obsession with multicollinearity, has (tongue in cheek) coined the term MICRONUMEROSITY, which he defines as the 'problem of small sample size.'"


*   For searches and help try:

© Copyright 1996–2021 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index