Bookmark and Share

Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down at the end of May, and its replacement, statalist.org is already up and running.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

st: Using Wilcoxon rank-sum (Mann-Whitney) test to compare an emipirical and a uniform distribution


From   "Tsankova, Teodora" <TsankovT@ebrd.com>
To   <statalist@hsphsun2.harvard.edu>
Subject   st: Using Wilcoxon rank-sum (Mann-Whitney) test to compare an emipirical and a uniform distribution
Date   Sat, 9 Mar 2013 13:49:57 -0000

Dear David,

Thank you for the suggestion. 

What I mean is that I create a uniform distribution between 0 and 1 with
15 observation. Given that every value should have the same probability
under a uniform distribution I divide 1 by 14 and create those equally
spaces 15 values. Plotting the CDF of those values would result in a
straight diagonal line which is ultimately what the ksmirnov test would
test against as well. 

The output from the ksmirnov test is as follows:

ksmirnov mean_random_BTWGr_Fx=uniform()

One-sample Kolmogorov-Smirnov test against theoretical distribution
           uniform()

 Smaller group       D       P-value  Corrected
 ----------------------------------------------
 mean_ra~r_Fx:       0.8221    0.000
 Cumulative:        -0.8983    0.000
 Combined K-S:       0.8983    0.000      0.000

So, it seems that although I can reject the inequality of the two
distributions, I cannot say anything about which one tends to have
larger values.

In Stata the -porder- option of the ranksum command gives the
probability that a random draw from the first sample is larger than a
random draw from the second sample. I like this as it seems very
intuitive. I use those constructed values to perform this test. My
results are as follows:

ranksum mean_random_BTWGr_Fx, by( ObservedORUniform) porder

Two-sample Wilcoxon rank-sum (Mann-Whitney) test

ObservedOR~m |      obs    rank sum    expected
-------------+---------------------------------
    Observed |       15         259       232.5
     Uniform |       15         206       232.5
-------------+---------------------------------
    combined |       30         465         465

unadjusted variance      581.25
adjustment for ties        0.00
                     ----------
adjusted variance        581.25

Ho: mea~r_Fx(Observ~m==Observed) = mea~r_Fx(Observ~m==Uniform)
             z =   1.099
    Prob > |z| =   0.2717

P{mea~r_Fx(Observ~m==Observed) > mea~r_Fx(Observ~m==Uniform)} = 0.618

Those results, although not very strong, seem much easier to interprpet.

Thank you again,

Teodora


-----Original Message-----
From: owner-statalist@hsphsun2.harvard.edu
[mailto:owner-statalist@hsphsun2.harvard.edu] On Behalf Of David Hoaglin
Sent: 09 March 2013 13:34
To: statalist@hsphsun2.harvard.edu
Subject: Re: st: Using Wilcoxon rank-sum (Mann-Whitney) test to compare
an emipirical and a uniform distribution

Teodora,

It seems odd to use a two-sample test when you actually have only one
sample.  What was the basis for the advice to use the
Wilcoxon-Mann-Whitney test?

A one-sided KS test would be all right.  Some people might be more
comfortable making the test two-sided, unless you would not have any
interest in a situation where the data departed from the null hypothesis
in the other direction.  I don't know what the literature says about
whether any other test has greater power for the type of alternative
that you are interested in.

With only 15 observations, the departure would have to be substantial to
reject the uniform null hypothesis.

I have an offbeat suggestion.  Transform the sample to normal deviates
by applying the inverse of the standard normal cumulative distribution
function to each observation, and test whether the transformed sample
departs from the standard normal distribution.  You can also make a
normal probability plot of the transformed sample.

What do you mean by "a constant markup of 1/14"?

David Hoaglin

On Thu, Mar 7, 2013 at 3:11 PM, Tsankova, Teodora <TsankovT@ebrd.com>
wrote:
> Some time ago I posted on statlist with a question regarding the use
of a one-sided KS test and I was advised that for my purpose I can use
the Wilcoxon-Mann-Whitney test (ranksum command in Stata).
>
> I basically have 15 observations that go from 0 to 1 and constitute my
empirical distribution and I want to prove that those take higher values
than a uniform distribution would suggest. I have three questions
related to the test:
>
> 1) I generate myself 15 more observation which take values from 0 to 1
with a constant markup of 1/14 (I simulate a uniform distribution of 15
variables in the same interval). Has anyone else used this method for
creating uniform distribution and do you see any problems with it?
>
> 2)  I use the ponder option to compute the p-value for the one sided
test and I get the following output:
>
> Two-sample Wilcoxon rank-sum (Mann-Whitney) test
>
> ObservedOr~m |      obs    rank sum    expected
> -------------+---------------------------------
>     Observed |       15         236       232.5
>      Uniform |       15         229       232.5
> -------------+---------------------------------
>     combined |       30         465         465
>
> unadjusted variance      581.25
> adjustment for ties        0.00
>                      ----------
> adjusted variance        581.25
>
> Ho: ktaub_~m(Observ~m==Observed) = ktaub_~m(Observ~m==Uniform)
>              z =   0.145
>     Prob > |z| =   0.8846
>
> P{ktaub_~m(Observ~m==Observed) > ktaub_~m(Observ~m==Uniform)} = 0.516
> 999996
> (15 real changes made)
> (0 real changes made)
> (0 real changes made)
>
> I would interpret it in the following way: In 51.6% of the cases you
would draw a random number from Observed that would be higher than a
random draw from Uniform. Is this the correct interpretation?
>
> 3) My last question is related to the fact that Wilcoxon Mann-Whitney
test is used to analyse ordinal data. My data has an ordinal meaning in
the sense higher values represent more homogenous group lending villages
in my case. However, the values the variable takes are not interval but
continuous ones. Can I still use this test?
>
> Thank you,
>
> Teodora

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/

EBRD SECURITY NOTICE
This email has been virus scanned

______________________________________________________________
This message may contain privileged information. If you have received this message by mistake, please keep it confidential and return it to the sender. 
Although we have taken steps to minimise the risk of transmitting software viruses, the EBRD accepts no liability for any loss or damage caused by computer viruses and would advise you to carry out your own virus checks. 
The contents of this e-mail do not necessarily represent the views of the EBRD.



*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   Site index