[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
RE: st: Computing medcouple
"Nick Cox" <firstname.lastname@example.org>
RE: st: Computing medcouple
Thu, 13 Aug 2009 18:16:50 +0100
This raises various questions, including
1. What is the Platonic skewness that you know you've measured
We often write things like
skewness = <formula>
but the = sign is here profoundly ambiguous. Those who think that the
<formula> is the thing and the name (here "skewness") is just an
arbitrary label are not in agreement with those who think that there is
an essential underlying thing and the formula our best guess at how we
measure it. The = sign should be := or =: to indicate direction of
assignment (a notation that I think started in Algol).
In Britain circa 1960 bananas used to bear sticky labels explaining that
they were indeed bananas. (This in turn was presumably a hangover from
post WWII when bananas were not widely available and as such unfamiliar
to shoppers.) It was a recurrent amusement of my childhood to peel off
these labels, stick them on friends and siblings and declare "So-and-so
is a banana". (We lacked other toys and were easily amused.) This silly
pursuit has a useful personal legacy in underlining that labels may be
sticky but that has no bearing on whether they are appropriate.
2. What is so special about particular quantiles anyway? I think the
median has a special role as the middle of any symmetric distribution
but I find it hard to see that even quartiles have any special role.
They may be simple to understand, familiar from introductory courses
and/or convenient in practice, but that doesn't make them statistically
natural. Of course, this is precisely why other quantiles further out in
the tail are also used, with an insurance problem of avoiding the
extremes to get some robustness against outliers.
I think it's a major criticism of box plots that they sanctify
A few years ago I tried to develop a measure of skewness based on
"Skewness" = [(P75-P50)-(P50-P25)]/[P75-p25]
And also similar ones based on P90 and P10. I did fairly extensive
simulations and found that the P90, P10 based ones did a bit better in
expressing skewness. In addition, the distribution of this was not
nicely behaved, but by log-transforming you would get a statistic that
looked very nicely normal.
In doing this I learned of the l-moments articles and was delighted that
Nick had already written a routine for this.
This sounds a little similar in spirit to using L-moments to calculate a
skewness measure. The latter approach arguably has two features: it is
systematic and it is already implemented in Stata through -lmoments-
As far as medcouple is concerned, you could compute it exactly or by
sampling. I've no code to offer.
My prejudice here is that for most problems you would be better off
either transforming the data or using a graph form that discarded less
of the information than a box plot does. Otherwise put, if the data are
very skew you usually need to see more detail about the tails than a
Ronnie Babigumira <email@example.com>
Vandervieren and Hubert (2004) present what they call a robust measure
of skewness using the medcouple
Given a distribution F, medcouple (MC(F)) is defined as
MC(F) = med h(xi,xj) given xi<med<xj
- h = (xj-m_F)-(m_F-xi)
- m_F is the median of F
I would like to compute MC but dont know how to even start. Any pointers
will be much appreciated.
Vanderviere, E. & Huber, M. (2004). An adjusted boxplot for skewed
distributions. In J. Antoch
(Ed.), COMPSTAT2004 Symposium: proceedings in computational statistics
(pp. 1933-1940). Heidelberg,
The paper can be downloaded here
* For searches and help try: