Statalist


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: difference between "Spearman" and "pwcorr / correlate"


From   Stas Kolenikov <skolenik@gmail.com>
To   statalist@hsphsun2.harvard.edu
Subject   Re: st: difference between "Spearman" and "pwcorr / correlate"
Date   Thu, 8 Oct 2009 14:39:04 -0500

On Thu, Oct 8, 2009 at 11:33 AM, Nick Cox <n.j.cox@durham.ac.uk> wrote:
> There's a tacit criterion here, that techniques must have simple verbal
> interpretations. I am as much in favour of simple verbal interpretations
> as the next person -- nay, on average, more so -- but while they're a
> bonus when available insisting on them would deprive you of much that is
> indispensable.
>
> What's the simple verbal interpretation of (say) eigenvectors or an SVD?

The eigenproblems are very visual. The eigenvalues tell you by how
much a unit vector will change its length, and eigenvectors give those
specific vectors and directions of where the change is exact: the
vector stretches without any rotation. If we talk about an
eigenproblem for a covariance matrix, then the eigenvalues are the
"radii" of an rugby/American football of the points in multivariate
space, and eigenvectors are again directions that give the orientation
of that rugby ball relative to the "official" axes. SVDs can be
explained by the -biplot-s, although with greater effort.

I usually want to know what I am estimating. Then I can eyeball
something along the lines of "the difference between the unknown
population distribution function and the sample distribution is such
and such, and hence by an appropriate version of the influence
function expansions and/or the delta-method, the difference between
the unknown parameter and the estimate at hand must be of such and
such order." Thanks to Roger, I now have a better clue of what I am
estimating with Spearman correlation. And there are probably a dozen
other rank-type correlations that would make at least as much sense as
(linear) correlation of the cdfs.

One other comparison can be made regarding the computational
requirements. Spearman's rho is O( n log(n) ) due to sorting, while
Kendall's tau is O( n^2 ) for the pairwise comparisons. Of course
Pearson's moment correlation is O( n ), it's just manipulation of
sums. One would only see differences in timing of Pearson and Spearman
with the sample sizes such that -sort- takes a noticeable amount of
time, while Kendall's tau is slow with more than 100 observations.

-- 
Stas Kolenikov, also found at http://stas.kolenikov.name
Small print: I use this email account for mailing lists only.
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index