# st: RE: Spearman rank correlation

 From "Nick Cox" To Subject st: RE: Spearman rank correlation Date Fri, 30 Aug 2002 14:50:28 +0100

```Jens M. Lauritsen

> Does anyone have good references for INTERPRETATION in
> relation to size of
> the spearman correlation coefficient.

The nonparametric books I have looked at are disappointing
in this respect. Once they have discussed the mechanics
of calculation and the use of Spearman rank as a test
statistic there is little or no attention to any use
of Spearman rank as a measure.

One answer is that as Spearman correlation is just
Pearson correlation on the ranks, you interpret
it just as you would Pearson correlation. However,
it is then a good idea to see the data as Spearman
sees them, i.e. look at scatter plots of ranked variables,
which seems to be very rarely done. In Stata a -foreach-
loop over variables producing a ranked version followed
by -graph, matrix- makes this possible
in a few lines.

The best discussion I know is in Harold Jeffreys'
"Theory of probability" (1939, 3rd edition 1961).
This reference is given in [R] spearman.

Broadly, Spearman rank can be thought of as a measure
of monotonicity, just as Pearson correlation is a
measure of linearity. However, the comparison is fraught
with challenges to intuition, or what your intuition may be.

Jeffreys shows that if -1 < x < 1 and y = x^3 then
Pearson r = 0.917. This certainly holds if x is uniform
on that interval. And you can readily verify this
yourself by

. set obs <whatever>
. range x -1 1
. gen y = x^3
. corr x y

This may be higher than many
people using correlations would expect, if they
argued that as this relationship is clearly and strongly
nonlinear, then a measure of linearity would be
way below 1. Or if their impression was that as
the description "linear relationship" is qualitatively
quite wrong for a cubic, then a quantitative measure would be
very poor. However, Pearson correlation answers
the question you ask, namely to give a measure of linearity,
even when the question is ill-advised.

I.J. Good in Biometrics December 1972 independently
made a similar point, with more results.

Of course, these examples are for exact relationships
and real data show scatter as well. But even
experienced analysts of data have found them striking.

A quite different comment is that in some ways one
or other version of Kendall's tau is easier to
interpret, because it is a difference of probabilities,
even though that's not your question.
Roger Newson's expository article in Stata Journal 2(1),
2002 gives much more detail. My guess is that Roger
will himself expand this point very fully.

>
> For a certain project we had to do many tables of these
> estimates and I
> produced a wrapper for the  ci2 (by Poul Seek) and the
> spearman commands
> one can get this output, which is easy to convert to a
> table in word by
> copy and paste, plus select the output as a block and
> "convert to text".
>
> If you dislike the ";" (e.g. because you are not a word
> user) the just omit
> that part.
>
> The same principle can be suded for any procedure where you
> want only
> estimates (here r(N) etc) and not the remaining texts of
> that command. The
> reason for "version 6" is that ci2 works only in version 6.

[ ... ]

The procedure -biv- published in the STB some years
ago provides a wrapper for bivariate calculations
such as these.

It has been partly, but not completely, superseded
by using -foreach- loops. It is not too difficult
to initialise a matrix and then use two -foreach-
loops over variables to fill that matrix with
bivariate results. Then -matrix- can be used
to tune the display. A code example is given
in

http://www.stata.com/support/meeting/8uk/fortitude.pdf

Nick
n.j.cox@durham.ac.uk
*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
```