Stata The Stata listserver
[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

Re: st: Spearman rank correlation


From   Roger Newson <roger.newson@kcl.ac.uk>
To   statalist@hsphsun2.harvard.edu
Subject   Re: st: Spearman rank correlation
Date   Sun, 01 Sep 2002 13:42:02 +0100

At 11:24 30/08/02 +0200, Jens Lauritsen wrote:
Does anyone have good references for INTERPRETATION in relation to size of
the spearman correlation coefficient.

The population Spearman correlation coefficient between X and Y is, by definition, the Pearson product-moment correlation between the cumulative distribution functions (CDFs) F_X(X) and F_Y(Y), where F_X(z) is the population CDF of X (ie Pr(X<=z)) and F_Y(z) is the population CDF of Y (ie Pr(Y<=z)). It is estimated by the sample Spearman correlation coefficient, ie the product-moment correlation between the sample ranks. Confidence intervals (CIs) around the sample Spearman correlation for the population Spearman correlation can be derived by the jackknife or bootstrap.

It is not easy (I think) to define, in plain language, an interpretation of the product-moment correlation between two CDFs. Most people, most of the time, think of the Spearman correlation only as a measure of positive or negative association on a scale from -1 to 1. This difficulty of interpretation is the reason why people like me prefer Kendall's tau-a, which is the difference between two probabilities, namely the probability of concordance and the probability of discordance. (If I am double-marking exam scripts with a colleague, and the Kendall's tau-a between our marks is 0.70, then this means that, given 2 exam scripts and asked which is the best, we are 70% more likely to agree than to disagree.) I wrote an article about Kendall's tau-a and its interpretation, in The Stata Journal (Newson,2002).

However, Spearman's correlation (as well as Kendall's) can be interpreted by assuming that X and Y are derived, by a pair of monotonic transformations, from two variables V=g(X) and W=h(Y) with a joint multivariate normal distribution. For Normal variables, the Pearson correlation coefficient is related to the Spearman and Kendall coefficients by the equations

rho=sin((pi/2)*tau)=2*sin((pi/6)*rho_s)

where rho is the Pearson correlation, rho_s is the Spearman correlation and tau is the Kendall correlation. (See Kendall, 1949.) As the Spearman and Kendall correlations are preserved by monotonic transformations such as g(.) and h(.), they are the same between X and Y as between V and W. Therefore, if you define a confidence interval for tau or rho_s between X and Y, you can transform that confidence interval (using the above equations) to get an outlier-resistant confidence interval for the Pearson correlation between V and W, without even having to know the form of the transformations g(.) and h(.).

I hope this helps.

Roger

References

Kendall MG. Rank and product-moment correlation. Biometrika 1949; 36: 177-193.

Newson R. Parameters behind "nonparametric" statistics: Kendall's tau, Somers' D and median differences. The Stata Journal 2002; 2(1): 45-64.

--
Roger Newson
Lecturer in Medical Statistics
Department of Public Health Sciences
King's College London
5th Floor, Capital House
42 Weston Street
London SE1 3QD
United Kingdom

Tel: 020 7848 6648 International +44 20 7848 6648
Fax: 020 7848 6620 International +44 20 7848 6620
or 020 7848 6605 International +44 20 7848 6605
Email: roger.newson@kcl.ac.uk

Opinions expressed are those of the author, not the institution.

*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/




© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index