Statalist


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

st: RE: Correlation of Dummy and Metric Variables?


From   "Verkuilen, Jay" <JVerkuilen@gc.cuny.edu>
To   "'statalist@hsphsun2.harvard.edu'" <statalist@hsphsun2.harvard.edu>
Subject   st: RE: Correlation of Dummy and Metric Variables?
Date   Tue, 22 Sep 2009 17:36:53 -0400

The point-biserial correlation is a correlation between a dichotomous variable and a continuous one. http://en.wikipedia.org/wiki/Point-biserial_correlation_coefficient

Peter Lachenbruch noted that this ends up being the same math as the t-test. 

Unfortunately, skew in the dichotomous variable tends to reduce correlations. Thus methods such as the biserial correlation (special case of the polyserial that Stas mentioned) "fix up" the correlation at the cost of making some assumptions about what the dichotomous variable that may or may not be true in practice. In essence, if you are willing to assume that the dichotomous variable comes from an underlying normal distribution, you can boost the correlation. However, if you are wrong and it's not, you may end up coming to the wrong conclusion.

You can certainly define measures of dependence between a nominal and continuous variable but this is going to get tricky because a nominal variable isn't really a variable (instead it is K-1 indicator variables, where K is the number of categories). 

Jay




*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index