Bookmark and Share

Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down on April 23, and its replacement, statalist.org is already up and running.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

st: Ksmirnov one-sided test interpretation


From   "Tsankova, Teodora" <TsankovT@ebrd.com>
To   <statalist@hsphsun2.harvard.edu>
Subject   st: Ksmirnov one-sided test interpretation
Date   Fri, 1 Mar 2013 09:39:33 -0000

Thank you for all the suggestions, Nick! I had been using distplot only
until now but the rest seem very informative as well.

-----Original Message-----
From: owner-statalist@hsphsun2.harvard.edu
[mailto:owner-statalist@hsphsun2.harvard.edu] On Behalf Of Nick Cox
Sent: 28 February 2013 19:40
To: statalist@hsphsun2.harvard.edu
Subject: Re: st: Ksmirnov one-sided test interpretation

One of the amusing things about statistical science is that every mildly
experienced person has a long personal list of statistical things that
they don't respect, if not as unsound in principle then as useless or
oversold in practice. But the lists don't overlap much!

A more polite list is always that of things that are in one's unhumble
opinion undersold and so deserve more exposure.

Like Joerg, I don't think I've used Kolmogorov-Smirnov for real in any
serious project. I'd rather start with a presumption that distributions
are different, but then the interesting thing is to see how and how much
they differ (and occasionally change my mind if the distributions turn
out to be practically identical).

I looked at the example for two groups that is bundled with -ksmirnov-,
which is a tiny fake dataset.

To make it more interesting, at least a bit, I suggest

sysuse auto, clear
ksmirnov mpg, by(foreign)

as a canonical example.

Here are five graphs you can draw instead. I leave aside more standard
graphs such as histograms (which here hide too much).

* 1 -stripplot- (SSC)
stripplot mpg, over(foreign) stack box(barw(0.1)) boffset(-0.1)
height(.4)

* 2 -qplot- (SJ)
qplot mpg, over(foreign)

* 3 -cquantile- (SSC)
cquantile mpg, by(foreign) gen(mpg1 mpg2) qqplot mpg?

* 4 -distplot- (SJ)
distplot mpg, over(foreign)

* 5 -devnplot- (SSC)
devnplot mpg foreign

All these graphs require some prior download as flagged in the comments.

If you don't like even _one_ of these graphs as being more informative
or interesting than the output of -ksmirnov-, then I've failed.

Other specific suggestions of graphs are naturally most welcome.

Nick
On Thu, Feb 28, 2013 at 6:37 PM, Joerg Luedicke
<joerg.luedicke@gmail.com> wrote:
> Yes, why not just looking at your data?
>
> That aside, I am wondering what the point of such a test is? What does

> it even mean that one distribution is "lower" than another? Or to 
> quote the Stata manual, version 11: "We wish to use the two-sample 
> Kolmogorov-Smirnov test to determine if there are any differences in 
> the distribution of x for these two groups..." "Any" differences seem 
> to pick up a mix of differences with regard to the location and shape 
> of distributions. What is the motivation behind this? If there are 
> differences in two distributions, why not just looking at what these 
> differences are? But even if there was a good reason for using this 
> test, I am wondering what it is telling us. I did not try hard to come

> up with the following example:
>
> Let's generate some data for two groups where the distribution in 
> group one is normal with mean 10 and SD 5, while the distribution in 
> the other group is a gamma with shape 5 and scale 2:
>
> *---------------
> clear
> set obs 200
> set seed 1234
>
> gen u = runiform()>.5
> gen x = rnormal(10,5) if u==0
> replace x=rgamma(5,2) if u==1
> *---------------
>
> and have a look at the empirical distribution for this data
realization:
>
> *---------------
> tw kdensity x if u==0 || kdensity x if u==1
> *---------------
>
> As expected, these distributions surely look different to me. We can 
> also have a look at the true functions:
>
> *---------------
> tw      function y = gammaden(5,2,0,x) , range(0 25) || ///
>         function y = normalden(x,10,5) , range(-5 25) ///
>         legend(order(1 "Gamma" 2 "Gauss"))
> *---------------
>
> Yet, if we run the K-S test:
>
> *---------------
> ksmirnov x, by(u) exact
> *---------------
>
> we would conclude that we cannot reject the hypothesis that the 
> distributions are "different"? That does not sound right to me.
>
> So, my bottom line is: a) that I wonder why one would use this test in

> the first place, and b) even if there was a good reason, I probably 
> would not trust it. I may very well be missing something here as I 
> have never used or studied this test before, so others, please correct

> me if I am wrong here with something.
>
> Joerg
>
>
>
> On Thu, Feb 28, 2013 at 1:06 PM, Nick Cox <njcoxstata@gmail.com>
wrote:
>> Why not plot the data to show what is going on?
>>
>> Nick
>>
>> On Thu, Feb 28, 2013 at 5:23 PM, Tsankova, Teodora
<TsankovT@ebrd.com> wrote:
>>
>>> I have a question related to a previous post:
>>>
>>> http://www.stata.com/statalist/archive/2009-01/msg00525.html
>>>
>>> The Stata output from this message is as follows:
>>>
>>> Two-sample Kolmogorov-Smirnov test for equality of distribution
functions:
>>>
>>> Smaller group       D       P-value  Corrected
>>> ----------------------------------------------
>>> male:               0.2468    0.002
>>> female:             0.0000    1.000
>>> Combined K-S:       0.2468    0.005      0.003
>>>
>>>
>>> From the one sided tests (first two lines) on can say which
distribution tends to be lower - for males or for females. However, I am
not sure how to interpret it.
>>>
>>> Given that the pvalue from the first line is low and that D in the
second line is 0, can we say that this is a proof that the distribution
of male is lower than that of female? To rephrase it - can we claim that
the distribution of male stochastically dominates the one of female
which would imply that the values of the underlying variable tend to be
larger for male than for female?  Or, do we interpret it in the exactly
opposite way - that the values for male tend to be lower than the values
for female?

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/

EBRD SECURITY NOTICE
This email has been virus scanned

______________________________________________________________
This message may contain privileged information. If you have received this message by mistake, please keep it confidential and return it to the sender. 
Although we have taken steps to minimise the risk of transmitting software viruses, the EBRD accepts no liability for any loss or damage caused by computer viruses and would advise you to carry out your own virus checks. 
The contents of this e-mail do not necessarily represent the views of the EBRD.



*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   Site index