Bookmark and Share

Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down at the end of May, and its replacement, statalist.org is already up and running.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: collinearity in categorical variables


From   Nick Cox <njcoxstata@gmail.com>
To   "statalist@hsphsun2.harvard.edu" <statalist@hsphsun2.harvard.edu>
Subject   Re: st: collinearity in categorical variables
Date   Fri, 26 Apr 2013 14:29:57 +0100

I think you're mixing quotations from two or three debates that barely
overlap. Whether polychoric or Spearman correlations are better suited
to categorical data doesn''t seem related to collinearity in
regression-type models. Even if (say) polychoric correlations appealed
more, how would that affect your choice of predictors in the latter
kind of model?

I tend to look directly at correlation and scatter plot matrices and
to think substantively about relationships. That doesn't rule out
specific tools being helpful.

Nick
njcoxstata@gmail.com


On 26 April 2013 13:58, Mitchell F. Berman <mfb1@columbia.edu> wrote:
> Thank you for the reply.  Yes, I see that for a single categorical variable
> broken into dummy variables, collinearity between the dummy variables would
> be zero.
> But my question concerns correlation between related, similar, categorical
> variables.
>
> If I have multiple similar categorical variables, for example: homebound,
> uses a walker, home-health aide, lives in nursing home, these categorical
> variables will move together though the data--- won't be identical for all
> patients, but correlated.
>
> People mention standard VIF (which I know how to do), but the more thorough
> answers imply this is not correct.
>
> This links suggests perturb (a module available for Stata, R, and SPSS) or
> polychoric correlation
> http://stats.stackexchange.com/questions/35233/how-to-test-for-and-remedy-multicollinearity-in-optimal-scaling-ordinal-regressi
>
> This link from talkstats suggests that polychoric correlations (available in
> R) are preferable, because correlations calculated using pearson product
> moment are invalid for categorical data.
> http://www.talkstats.com/showthread.php/22996-Collinearity-Among-Categorical-Variables-in-Regression
>
> someone else suggested spearman correlation coefficient
> http://www.statisticsforums.com/showthread.php?t=802
>
> factor analysis
> http://www.talkstats.com/showthread.php/13264-Collinearity-in-Logistic-Regression
>
> This is beyond my level of theoretical understanding.  I was trying to get a
> sense of what the experts on the Stata List server use.
>
> Thank you for any additional input.
>
>
> Mitchell
>
>
>
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/faqs/resources/statalist-faq/
> *   http://www.ats.ucla.edu/stat/stata/
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   Site index