Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: collinearity in categorical variables

From   Maarten Buis <>
Subject   Re: st: collinearity in categorical variables
Date   Fri, 26 Apr 2013 15:33:01 +0200

On Fri, Apr 26, 2013 at 2:58 PM, Mitchell F. Berman wrote:
> I see that for a single categorical variable
> broken into dummy variables, collinearity between the dummy variables would
> be zero.

That is incorrect, the correlation between these indicator variables
tend to be negative and can easily be non-trivial.

> People mention standard VIF (which I know how to do), but the more thorough
> answers imply this is not correct.

Multicolinearity is all about correlation, so I see no problem with
using VIF. The VIF is based on correlation. and though you want to be
careful using correlation between binariy variables (or categorical
variable split up into different binary variables) when doing
substantive research, it is perfectly ok to use that to diagnose
multicolineartiy because that linear association is the real problem
when it comes to multicolinearity.

> I was trying to get a
> sense of what the experts on the Stata List server use.

I tend to do nothing about it. Multicolinearity is not a problem, it
just an accurate representation that you have less information in your
data than you would have liked. That may be unfortunate, but it
certainly is not a problem that needs to be addressed. There are
always exceptions, but in those cases looking at patterns of linear
association between the explanatory variables is all that is needed,
so VIF would be perfectly fine.

-- Maarten

Maarten L. Buis
Reichpietschufer 50
10785 Berlin
*   For searches and help try:

© Copyright 1996–2017 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index