[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: multicollinearity

From   Chris Witte <>
Subject   Re: st: multicollinearity
Date   Thu, 20 Nov 2008 07:04:31 -0800 (PST)

Thanks Michael.  I knew about the multiple posts, but after having waited for over 5 hours without my question being posted I thought that maybe the server did not like my inclusion of the link, which is why my third attempt did not include the question about the UCLA example.

Regarding the UCLA link: this link is listed in the Stata help search results for "multicollinearity".  Someone at Stata needs to remove this from the help database.

Thanks for the multicollinearity information!  It makes sense that Stata would take this approach, as the acceptable amount of multicollinearity seems rather subjective.  I've been taught that correlations > 0.70 is something to be concerned with, but I'm sure that many other people would suggest different values.  I'm in the field of fisheries biology, and usually deal with relatively small sample sizes.

----- Original Message ----
From: Michael S. Hanson <>
Sent: Wednesday, November 19, 2008 11:11:14 PM
Subject: Re: st: multicollinearity


On Nov 19, 2008, at 3:06 PM, Chris Witte wrote:
On Nov 19, 2008, at 5:01 PM, Chris Witte wrote:
On Nov 19, 2008, at 8:11 PM, Chris Witte wrote:

1) The Statalist FAQ strongly suggests not posting the same message multiple times.  You may find it useful to review the FAQ at <>.

> Is there another way to get the following module (the link isn't working for me)?
> Example .  Stata learning module on regression diagnostics:  Multicollinearity
>        . . . . . . . . . . . . . . . . . .  UCLA Academic Technology Services
>        12/03

2)  I suspect that page just doesn't exist anymore.  Not too surprising:  it is from December 2003 -- almost 5 years ago, which was also a few versions of Stata ago.  If you poke around the UCLA ATS web site, you might find related materials.  Also, Google is your friend (TM).

> Also, I have read that -anova- and -regress- will drop variables that have collinearity problems, but I have never had Stata drop variables on me.  For example:
> sysuse auto
> reg price headroom trunk weight length turn displacement gear_ratio


> and the correlation between weight and length is 0.9460.  Why aren't one of these variables dropped?  Does there have to be perfect correlation for dropping variables?

3) In a nutshell, yes.  Multicollinearity and perfect collinearity are not the same thing.  Indeed, they are conceptually rather different.  (Your (sub)discipline may use slightly different terms for these two concepts.)  Kennedy's "A Guide to Econometrics" (for example, the 5th edition, MIT Press, 2003) dedicates a whole chapter to multicollinearity, and has a decent discussion of this distinction.  The explanation I have often given to my students is that multicollinearity is a sample problem -- which in many cases could conceptually be avoided by collecting more or "better" data -- whereas perfect collinearity is a model or specification problem -- in which no amount of additional data will resolve your specification error.  Mathematically, with perfect collinearity, the (X'X) matrix is rank deficient and therefore not invertible:  the OLS estimator simply does not exist in this case.  Stata thus drops each collinear variable until (X'X) is of
 full rank, and the regression then can be estimated on the remaining variables.  Other members of Statalist suggested to you a few synthetic examples in earlier replies.  Multicollinearity inflates variances, thereby complicating inference, but it does not preclude estimation.

In Wooldridge's "Introductory Econometrics" textbook (for example, pp. 102-4 of the 3rd edition, Thomson South-Western, 2006) there is a very informative discussion of multicollinearity, which contains the following useful insight:

"Worrying about high degrees of correlation among the independent variables in the sample is really no different from worrying about a small sample size: both work to increase [the variance of beta hat].  The famous University of Wisconsin econometrician Arthur Goldberger, reacting to econometricians' obsession with multicollinearity, has (tongue in cheek) coined the term MICRONUMEROSITY, which he defines as the 'problem of small sample size.'"


*  For searches and help try:


*   For searches and help try:

© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index