Stata The Stata listserver
[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

st: RE: Spearman rank correlation

From   "Nick Cox" <[email protected]>
To   <[email protected]>
Subject   st: RE: Spearman rank correlation
Date   Fri, 30 Aug 2002 14:50:28 +0100

Jens M. Lauritsen

> Does anyone have good references for INTERPRETATION in 
> relation to size of
> the spearman correlation coefficient. 

The nonparametric books I have looked at are disappointing 
in this respect. Once they have discussed the mechanics 
of calculation and the use of Spearman rank as a test 
statistic there is little or no attention to any use 
of Spearman rank as a measure. 

One answer is that as Spearman correlation is just 
Pearson correlation on the ranks, you interpret 
it just as you would Pearson correlation. However, 
it is then a good idea to see the data as Spearman
sees them, i.e. look at scatter plots of ranked variables, 
which seems to be very rarely done. In Stata a -foreach- 
loop over variables producing a ranked version followed 
by -graph, matrix- makes this possible 
in a few lines. 

The best discussion I know is in Harold Jeffreys' 
"Theory of probability" (1939, 3rd edition 1961). 
This reference is given in [R] spearman. 

Broadly, Spearman rank can be thought of as a measure 
of monotonicity, just as Pearson correlation is a 
measure of linearity. However, the comparison is fraught 
with challenges to intuition, or what your intuition may be. 

Jeffreys shows that if -1 < x < 1 and y = x^3 then 
Pearson r = 0.917. This certainly holds if x is uniform 
on that interval. And you can readily verify this 
yourself by 

. set obs <whatever> 
. range x -1 1 
. gen y = x^3 
. corr x y 

This may be higher than many 
people using correlations would expect, if they 
argued that as this relationship is clearly and strongly 
nonlinear, then a measure of linearity would be
way below 1. Or if their impression was that as 
the description "linear relationship" is qualitatively 
quite wrong for a cubic, then a quantitative measure would be
very poor. However, Pearson correlation answers 
the question you ask, namely to give a measure of linearity, 
even when the question is ill-advised. 

I.J. Good in Biometrics December 1972 independently 
made a similar point, with more results. 

Of course, these examples are for exact relationships 
and real data show scatter as well. But even 
experienced analysts of data have found them striking. 

A quite different comment is that in some ways one 
or other version of Kendall's tau is easier to 
interpret, because it is a difference of probabilities, 
even though that's not your question. 
Roger Newson's expository article in Stata Journal 2(1), 
2002 gives much more detail. My guess is that Roger 
will himself expand this point very fully. 

> For a certain project we had to do many tables of these 
> estimates and I
> produced a wrapper for the  ci2 (by Poul Seek) and the 
> spearman commands
> one can get this output, which is easy to convert to a 
> table in word by
> copy and paste, plus select the output as a block and 
> "convert to text". 
> If you dislike the ";" (e.g. because you are not a word 
> user) the just omit
> that part. 
> The same principle can be suded for any procedure where you 
> want only
> estimates (here r(N) etc) and not the remaining texts of 
> that command. The
> reason for "version 6" is that ci2 works only in version 6. 

[ ... ] 

The procedure -biv- published in the STB some years 
ago provides a wrapper for bivariate calculations 
such as these. 

It has been partly, but not completely, superseded 
by using -foreach- loops. It is not too difficult 
to initialise a matrix and then use two -foreach- 
loops over variables to fill that matrix with 
bivariate results. Then -matrix- can be used 
to tune the display. A code example is given 

[email protected] 
*   For searches and help try:

© Copyright 1996–2024 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index