# RE: st: difference between "Spearman" and "pwcorr / correlate"

 From "Nick Cox" To Subject RE: st: difference between "Spearman" and "pwcorr / correlate" Date Fri, 9 Oct 2009 11:41:37 +0100

```My point needs rephrasing. I draw a distinction between verbal
definitions or characterisations on the one hand and verbal analogies on
the other. The difference lies in whether you can take the verbal
statements and reconstruct the formula or method from them; with mere
analogies you can't do that. However, Pearson correlations are pretty
much defined by their square being the fraction of variance explained by
the corresponding regression, modulo sign of course. In contrast, if I
explain Spearman correlation in verbal terms as a measure of
monotonicity that does not imply the particular formula used.

Nick
n.j.cox@durham.ac.uk

Stas Kolenikov

On Thu, Oct 8, 2009 at 11:33 AM, Nick Cox <n.j.cox@durham.ac.uk> wrote:
> There's a tacit criterion here, that techniques must have simple
verbal
> interpretations. I am as much in favour of simple verbal
interpretations
> as the next person -- nay, on average, more so -- but while they're a
> bonus when available insisting on them would deprive you of much that
is
> indispensable.
>
> What's the simple verbal interpretation of (say) eigenvectors or an
SVD?

The eigenproblems are very visual. The eigenvalues tell you by how
much a unit vector will change its length, and eigenvectors give those
specific vectors and directions of where the change is exact: the
vector stretches without any rotation. If we talk about an
eigenproblem for a covariance matrix, then the eigenvalues are the
"radii" of an rugby/American football of the points in multivariate
space, and eigenvectors are again directions that give the orientation
of that rugby ball relative to the "official" axes. SVDs can be
explained by the -biplot-s, although with greater effort.

I usually want to know what I am estimating. Then I can eyeball
something along the lines of "the difference between the unknown
population distribution function and the sample distribution is such
and such, and hence by an appropriate version of the influence
function expansions and/or the delta-method, the difference between
the unknown parameter and the estimate at hand must be of such and
such order." Thanks to Roger, I now have a better clue of what I am
estimating with Spearman correlation. And there are probably a dozen
other rank-type correlations that would make at least as much sense as
(linear) correlation of the cdfs.

One other comparison can be made regarding the computational
requirements. Spearman's rho is O( n log(n) ) due to sorting, while
Kendall's tau is O( n^2 ) for the pairwise comparisons. Of course
Pearson's moment correlation is O( n ), it's just manipulation of
sums. One would only see differences in timing of Pearson and Spearman
with the sample sizes such that -sort- takes a noticeable amount of
time, while Kendall's tau is slow with more than 100 observations.

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
```