Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

# Re: st: collinearity in categorical variables

 From "JVerkuilen (Gmail)" To statalist@hsphsun2.harvard.edu Subject Re: st: collinearity in categorical variables Date Thu, 25 Apr 2013 14:29:15 -0400

```On Thu, Apr 25, 2013 at 2:17 PM, Mitchell F. Berman <mfb1@columbia.edu> wrote:
> Stata Users:
>
> We are working on a logistic regression model with both continuous and
> categorical independent variables.
>
> I'm familiar with collin to generate VIF and condition index.  But my
> impression and information on the internet suggests that collin is not
> appropriate for categorical variables.
>
> What would people use to evaluate collinearity (probably not the correct
> term in this case) for categorical variables.

I'm not 100% sure what you mean. Are you checking for linear
dependence issues among right hand side variables? If so, properly
coded categorical variables (e.g., dummies) are certainly reasonable
to check with collinearity diagnostics; nothing in the math precludes
it. You can have two problems with logistic regression, separation and
collinearity. Stata detects perfect separation and throws an error,
but near separation can be really awful to track down. It is usually
seen by having ludicrously large standard errors on a regression
coefficient.

One common trick for collinearity is to use regress for the same model
and then compute the VIFs, which are functions of the X variables and
thus the fact that regress isn't the model you want to run doesn't
matter. It's not quite perfect but it's a reasonable start.

The diagnostic information available after logit is quite nice.

Jay
--
JVVerkuilen, PhD
jvverkuilen@gmail.com

“He uses statistics as a drunken man uses lamp-posts – for support
rather than illumination.”--Andrew Lang

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/
```