Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

From |
Steve Samuels <sjsamuels@gmail.com> |

To |
statalist@hsphsun2.harvard.edu |

Subject |
Re: st: collinearity in categorical variables |

Date |
Fri, 26 Apr 2013 11:14:20 -0400 |

The following technical report discusses multicollinearity and categorical variables. Malte Wissmann & Helge Toutenburg & Shalabh Role of Categorical Variables in Multicollinearity in the Linear Regression Model Technical Report Number 008, 2007 Department of Statistics University of Munich http://epub.ub.uni-muenchen.de/2081/1/report008_statistics.pdf Wissman et al. has two references to the perturbation approach to detecting collinearity . DA Belsley, Conditioning Diagnostics: Collinearity and Weak Data in Regression, 1 ed., John Wiley & Sons, Inc. New York, 1991. CR Rao, H. Toutenburg, Shalabh, and C. Heumann, Linear Models and Generalizations - Least Squares and Alternatives, 3rd ed., Springer, 2008. John Hendrickx, the author of -perturb- (and -coldiag2-) also wrote the 'perturb' package in R. The documentation contains a reference to a paper, but the link is broken. (The Wissman et al. report refers to the paper differently and also gives a broken link.) Hendrickx, John, Ben Pelzer. (2004). Collinearity involving ordered and unordered categorical variables. Paper presented at the RC33 conference in Amsterdam, August 17-20 2004. Steve On Apr 26, 2013, at 9:33 AM, Maarten Buis wrote: On Fri, Apr 26, 2013 at 2:58 PM, Mitchell F. Berman wrote: > I see that for a single categorical variable > broken into dummy variables, collinearity between the dummy variables would > be zero. That is incorrect, the correlation between these indicator variables tend to be negative and can easily be non-trivial. > People mention standard VIF (which I know how to do), but the more thorough > answers imply this is not correct. Multicolinearity is all about correlation, so I see no problem with using VIF. The VIF is based on correlation. and though you want to be careful using correlation between binariy variables (or categorical variable split up into different binary variables) when doing substantive research, it is perfectly ok to use that to diagnose multicolineartiy because that linear association is the real problem when it comes to multicolinearity. > I was trying to get a > sense of what the experts on the Stata List server use. I tend to do nothing about it. Multicolinearity is not a problem, it just an accurate representation that you have less information in your data than you would have liked. That may be unfortunate, but it certainly is not a problem that needs to be addressed. There are always exceptions, but in those cases looking at patterns of linear association between the explanatory variables is all that is needed, so VIF would be perfectly fine. -- Maarten --------------------------------- Maarten L. Buis WZB Reichpietschufer 50 10785 Berlin Germany http://www.maartenbuis.nl -------------------------------- * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/faqs/resources/statalist-faq/ * http://www.ats.ucla.edu/stat/stata/ * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/faqs/resources/statalist-faq/ * http://www.ats.ucla.edu/stat/stata/

**References**:**Re: st: collinearity in categorical variables***From:*"Mitchell F. Berman" <mfb1@columbia.edu>

**Re: st: collinearity in categorical variables***From:*Maarten Buis <maartenlbuis@gmail.com>

- Prev by Date:
**Re: st: Inefficiency measures greater than one for frontier commands** - Next by Date:
**Re: st: Does ml requires a non-linear function to have a linear part?** - Previous by thread:
**Re: st: collinearity in categorical variables** - Next by thread:
**Re: st: collinearity in categorical variables** - Index(es):