Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: Can Spearman's rho be used to measure of the degree of association between two binary variables ?


From   Richard Williams <richardwilliams.ndu@gmail.com>
To   statalist@hsphsun2.harvard.edu, statalist@hsphsun2.harvard.edu
Subject   Re: st: Can Spearman's rho be used to measure of the degree of association between two binary variables ?
Date   Mon, 21 May 2012 07:55:05 -0500

At 01:56 AM 5/21/2012, Maarten Buis wrote:
On Mon, May 21, 2012 at 12:19 AM, Marcos Vinicius wrote:
> I was conducting a multicollinearity diagnostic analysis for a logistic regression using spearman correlation and VIF. Important detail:All the covariates are binary variables.

Multicollinearity is never a problem, see e.g.:
<http://www.stata.com/statalist/archive/2010-07/msg00675.html>, so
there is nothing to diagnose. If you want to inspect the association
between binary covariates I would look at a table of odds ratios.

-- Maarten

I would have to disagree with that a bit. Sometimes multicollinearity might reflect a mistake on the researchers part. For example, your model includes education, income, and then you decide to include this SES measure you find at the end of the codebook. If SES was computed using income and education, you may have extreme or even perfect multicollinearity.

Or, suppose you have a categorical variable, and you create dummies out of it. If some categories have extremely small Ns (e.g. 2 cases) you will get near-perfect collinearity. You may have to combine categories or else drop some cases.

Suppose, too, that you have several items that basically measure the same concept. You may be better off creating a scale from the items or constraining them all to have the same effects.

I don't think I have ever seen it happen with Stata, but there might be situations where multic makes it difficult for the model to converge. If so, doing things like centering a variable before you square it might help.

If you happen to be at the design stage of the study and you are worried about multic, you may wish to collect a larger sample as larger samples will reduce the standard errors.

I do think the problem is exaggerated. But, the researcher should be aware that they may have done something stupid, that there may be better ways to set the problem up, and that they may be able to avoid the problem in the first place when they design their study.

Also, I would discourage simply dropping variables that seem to be causing you problems, as that could lead to specification error, which may be an even worse problem.


-------------------------------------------
Richard Williams, Notre Dame Dept of Sociology
OFFICE: (574)631-6668, (574)631-6463
HOME:   (574)289-5227
EMAIL:  Richard.A.Williams.5@ND.Edu
WWW:    http://www.nd.edu/~rwilliam

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index