[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: st: difference between "Spearman" and "pwcorr / correlate"

From   "Newson, Roger B" <>
To   "''" <>
Subject   RE: st: difference between "Spearman" and "pwcorr / correlate"
Date   Wed, 7 Oct 2009 22:31:34 +0100

There IS an interpretation of the Spearman correlation for continuous variables in an infinite population. In that case, if the random variables are X and Y, then the Spearman rho(X,Y) is simply the Pearson correlation of F_X(X) and F_Y(Y), where F_X(.) and F_Y(.) are the population cumulative distribution functions of X and Y respectively. And a Pearson correlation, as always, is a measure of linearity.

The two main problems with the Spearman rho are that (a) it is ONLY a measure of linearity between 2 cumulative distribution functions (with no interpretation as a difference between concordance and discordance probabilities), and that (b) the Central Limit Theorem works a lot less quickly for the sample Spearman rho than for the sample Kendall tau-a, especially under the null hypothesis of zero correlation (see Kendall and Gibbons, 1990).

Best wishes



Kendall, M. G., and J. D. Gibbons. 1990. Rank Correlation Methods. 5th ed. Oxford, UK: Oxford University Press.

Roger B Newson BSc MSc DPhil
Lecturer in Medical Statistics
Respiratory Epidemiology and Public Health Group
National Heart and Lung Institute
Imperial College London
Royal Brompton Campus
Room 33, Emmanuel Kaye Building
1B Manresa Road
London SW3 6LR
Tel: +44 (0)20 7352 8121 ext 3381
Fax: +44 (0)20 7351 8322
Web page:
Departmental Web page:

Opinions expressed are those of the author, not of the institution.

-----Original Message-----
From: [] On Behalf Of Stas Kolenikov
Sent: 07 October 2009 21:27
Subject: Re: st: difference between "Spearman" and "pwcorr / correlate"

>  >Inference for Pearson's moment correlation relies on normality of the
>  >data. Spearman rank correlation is free of any assumptions, but there
>  >is no population characteristic that it estimates, which makes
>  >interpretation and asymptotic inference somewhat weird. If one is
>  >significant and the other is not, you are making either type I or type
>  >II error somewhere.
>  In the angels on the head of a pin vein:
>  Of possible interest in this regard is that the Spearman coefficient is the
> same as the Pearson calculated on the ranked values of the variables (ties
> getting the average rank).  I would agree that this is not a terribly
> interesting population parameter, but isn't this nevertheless an
> estimable/testable population characteristic?

If you have a finite population, then of course you will have Spearman
correlation for it. Although if you want to set up any asymptotic
framework, you will be trying to hit a moving target. I don't think
there is a meaningful definition of Spearman correlation for infinite
populations/continuous variables, although I might be mistaken. On the
other hand, Kendall's tau, as Nick Cox quoted from Roger Newson, has
explicit population analogues in probabilities of concordant and
discordant pairs of observations.

The question is: if the correlation estimate is 0.5, what does it say?
For Pearson moment correlation, it means that the proportion of
explained variance in a bivariate regression is 0.25. For Kendall's
tau, it means that for every discordant pair of observations, there
are three concordant pairs (i.e., Prob[ concordant ] = 3 Prob[
discordant ] = 3/4 ). For Spearman rank correlation, you can only say
that the variables are positively associated, but not much more.

Stas Kolenikov, also found at
Small print: I use this email account for mailing lists only.
*   For searches and help try:

*   For searches and help try:

© Copyright 1996–2017 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index