RE: st: RE: RE: median equality test for non normal variables

From   "Feiveson, Alan H. (JSC-SK311)" <>
To   "" <>
Subject   RE: st: RE: RE: median equality test for non normal variables
Date   Tue, 25 May 2010 11:04:47 -0500

Isn't it true that the Wilcoxon rank sum test is designed only for possibilities of one distribution being a translation of the other? So the null would be identical distributions; the alternatives would be that the distributions differ only by a translation.

So if distributions have different shapes but the same medians one might naively assume the "null" is true, but as this example shows, such a condition will likely be rejected by -ranksum-.

Here's another example with continuous data:

One distribution is gamma(1,1), while the other is a reflection of the first plus a translation so that both have the same median.

drop _all
 set obs 100
 gen y=rgamma(1,1)
 summ y,det
 local med = r(p50)
 set obs 200
 gen group = 1 in 1/100
 replace group=2 in 101/200
 gen negy = -y[_n-100] if group==2
 replace y = 2*`med'+negy if group==2
 noi sum y if group==1,det
 noi sum y if group==2,det
 noi ranksum y,by(group)
 noi qreg y group

Note -ranksum- rejects its null (that the two distributions are identical, not that the medians are the same), whereas -qreg- accepts its null of equal medians.

Al Feiveson

There is an interesting question concerning the difference between  
what people think they are doing when applying a 'nonparametric' test  
and what is actually happening.

Consider the following data:

input var group
1 0
2 0
3 0
4 0
4 0
4 0
4 0
4 1
4 1
4 1
4 1
5 1
6 1
7 1

Note that the median coincides with the highest value in group zero  
and the lowest value in group 1.

What we get now depends critically on what we ask for:

Test for equality of medians using -qreg- : P=1.000 (the medians are  
the same)
Wilcoxon rank sum test : Prob > |z| =   0.0196
Median test (which does not test for equality of medians, NB) :  
Pearson chi2(1) =   3.8182   Pr = 0.051
Median test, continuity corrected : Pearson chi2(1) =   1.6970   Pr =  
Ordered logit regression with group as a predictor : P =  0.997
'Harrell's C' (as calculated by -somersd-) : .76, P < 0.001

I have put quotes around Harrell's C, as this quantity is simply a  
rescaling of Mann Whitney's U, dividing it by its maximum possible  
value, and was first proposed by Richard Herrnstein in 1976  
(Herrnstein, R. J., Loveland, D. H., & Cable, C. (1976). Natural  
concepts in pigeons. Journal of Experimental Psychology: Animal  
Behavior Processes, 2, 285-302), who termed it rho. Fans of  
terminological chaos will also recognise the entity as the area under  
the ROC curve. Harrell's C is identical with rho only when the data  
are uncensored (James A. Koziol, Zhenyu Jia.T he Concordance Index C  
and the Mann-Whitney Parameter Pr(X>Y) with Randomly Censored Data  
Biometrical Journal 2009:51(3);467 - 474.)

I fancy that there is an amusing paper on this, clarifying the  
hypotheses being tested in each case, if anyone has time to write one...

I am looking again at the t-test, which, after a couple of Kolmogorov- 
Smirnovs, is beginning to look more and more attractive.

Ronan Conroy
Royal College of Surgeons in Ireland
Epidemiology Department,
Beaux Lane House, Dublin 2, Ireland
+353 (0)1 402 2431
+353 (0)87 799 97 95
+353 (0)1 402 2764 (Fax - remember them?)

