[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

From |
Roger Newson <roger.newson@kcl.ac.uk> |

To |
statalist@hsphsun2.harvard.edu |

Subject |
Re: st: Spearman rank correlation |

Date |
Sun, 01 Sep 2002 13:42:02 +0100 |

At 11:24 30/08/02 +0200, Jens Lauritsen wrote:

Does anyone have good references for INTERPRETATION in relation to size of

the spearman correlation coefficient.

The population Spearman correlation coefficient between X and Y is, by definition, the Pearson product-moment correlation between the cumulative distribution functions (CDFs) F_X(X) and F_Y(Y), where F_X(z) is the population CDF of X (ie Pr(X<=z)) and F_Y(z) is the population CDF of Y (ie Pr(Y<=z)). It is estimated by the sample Spearman correlation coefficient, ie the product-moment correlation between the sample ranks. Confidence intervals (CIs) around the sample Spearman correlation for the population Spearman correlation can be derived by the jackknife or bootstrap.

It is not easy (I think) to define, in plain language, an interpretation of the product-moment correlation between two CDFs. Most people, most of the time, think of the Spearman correlation only as a measure of positive or negative association on a scale from -1 to 1. This difficulty of interpretation is the reason why people like me prefer Kendall's tau-a, which is the difference between two probabilities, namely the probability of concordance and the probability of discordance. (If I am double-marking exam scripts with a colleague, and the Kendall's tau-a between our marks is 0.70, then this means that, given 2 exam scripts and asked which is the best, we are 70% more likely to agree than to disagree.) I wrote an article about Kendall's tau-a and its interpretation, in The Stata Journal (Newson,2002).

However, Spearman's correlation (as well as Kendall's) can be interpreted by assuming that X and Y are derived, by a pair of monotonic transformations, from two variables V=g(X) and W=h(Y) with a joint multivariate normal distribution. For Normal variables, the Pearson correlation coefficient is related to the Spearman and Kendall coefficients by the equations

rho=sin((pi/2)*tau)=2*sin((pi/6)*rho_s)

where rho is the Pearson correlation, rho_s is the Spearman correlation and tau is the Kendall correlation. (See Kendall, 1949.) As the Spearman and Kendall correlations are preserved by monotonic transformations such as g(.) and h(.), they are the same between X and Y as between V and W. Therefore, if you define a confidence interval for tau or rho_s between X and Y, you can transform that confidence interval (using the above equations) to get an outlier-resistant confidence interval for the Pearson correlation between V and W, without even having to know the form of the transformations g(.) and h(.).

I hope this helps.

Roger

References

Kendall MG. Rank and product-moment correlation. Biometrika 1949; 36: 177-193.

Newson R. Parameters behind "nonparametric" statistics: Kendall's tau, Somers' D and median differences. The Stata Journal 2002; 2(1): 45-64.

--

Roger Newson

Lecturer in Medical Statistics

Department of Public Health Sciences

King's College London

5th Floor, Capital House

42 Weston Street

London SE1 3QD

United Kingdom

Tel: 020 7848 6648 International +44 20 7848 6648

Fax: 020 7848 6620 International +44 20 7848 6620

or 020 7848 6605 International +44 20 7848 6605

Email: roger.newson@kcl.ac.uk

Opinions expressed are those of the author, not the institution.

*

* For searches and help try:

* http://www.stata.com/support/faqs/res/findit.html

* http://www.stata.com/support/statalist/faq

* http://www.ats.ucla.edu/stat/stata/

- Next by Date:
**st: SSC activity, August 2002** - Next by thread:
**st: SSC activity, August 2002** - Index(es):

© Copyright 1996–2015 StataCorp LP | Terms of use | Privacy | Contact us | What's new | Site index |