[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

From |
"Sergiy Radyakin" <serjradyakin@gmail.com> |

To |
"statalist@hsphsun2.harvard.edu" <statalist@hsphsun2.harvard.edu> |

Subject |
st: Correlation coefficient between discrete and continuous variables |

Date |
Thu, 20 Nov 2008 13:08:07 -0500 |

Dear All, a colleague of mine has just hinted me that it may not be straightforward to compute a correlation coefficient when one of the variables is discrete. Until now I never cared, and neither does the Stata manual. In particular it does not require anywhere the variables to be continuous, and the example shows the use of -correlate- command to find a correlation between such discrete variables as -state- and -region- and such continuous variables as -marriage rate-, -divorce rate- (which is also strange since there is no logical ordering of -state- and -region-, but that is a different issue). After looking into the literature, the following paper seems to be most relevant: N.R.Cox "Estimation of the Correlation between a Continuous and a Discrete Variable", Biometrics, Vol.30, No.1 (Mar., 1974), pp. 171-178 www.jstor.org/stable/2529626 In particular my case satisfies the assumptions made in the paper that the discrete value is derived from an underlying continuous variable (so there is ordering: low, medium, or high).The way it is recommended in the paper seems very far away from what Stata seems to be computing according to the manual, in particular it calls for iterative maximum likelihood estimation. Before I start writing any code myself, I would like to ask: Q1: does Stata do any adjustment to the way it computes the correlation coefficient based on the nature of the variable (discrete or continuous)? Q2: is the difference between (the correlation coefficient as estimated by Stata in this case) and (the one computed by the recommended way) practically important? Q3: is there any standard or user-written command to compute the correlation coefficient according to the method described in the paper above? Q4:I am ultimately interested in the correlation between my observed continuous variable and the unobserved continuous variable, which is represented in the discrete levels. Unfortunately the thresholds are not available to me, so I may not be sure about the size of the intervals. Furthermore, a significant measurement error may be involved, since many interviewers may have eyeballed the continuous variable into different groups differently. Should I instead focus on different measures of correlation? Could you please suggest any ones that better fit the context? Thank you, Sergiy Radyakin * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

**Follow-Ups**:**Re: st: Correlation coefficient between discrete and continuous variables***From:*David Airey <david.airey@vanderbilt.edu>

**st: RE: Correlation coefficient between discrete and continuous variables***From:*"Lachenbruch, Peter" <Peter.Lachenbruch@oregonstate.edu>

**Re: st: Correlation coefficient between discrete and continuous variables***From:*Steven Samuels <sjhsamuels@earthlink.net>

**st: RE: Correlation coefficient between discrete and continuous variables***From:*"Nick Cox" <n.j.cox@durham.ac.uk>

**Re: st: Correlation coefficient between discrete and continuous variables***From:*"Austin Nichols" <austinnichols@gmail.com>

- Prev by Date:
**st: Meta-analysis with 2 zero cells** - Next by Date:
**RE: st: Create a normalized variable** - Previous by thread:
**st: Meta-analysis with 2 zero cells** - Next by thread:
**Re: st: Correlation coefficient between discrete and continuous variables** - Index(es):

© Copyright 1996–2015 StataCorp LP | Terms of use | Privacy | Contact us | What's new | Site index |