Statalist The Stata Listserver


[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

Re: st: Ksmirnov discrete data (again)


From   "Ben Jann" <[email protected]>
To   [email protected]
Subject   Re: st: Ksmirnov discrete data (again)
Date   Fri, 15 Jun 2007 14:50:52 +0200

Yes, they only added the warning message. See

http://www.stata.com/statalist/archive/2007-05/msg00650.html

I have a hard time believing what Kirstin writes ("However, there
are adjustments made later in the code which ensure that correct test
statistic is produced."). My own experience is that in the presence of
ties, -ksmirnov- computes a wrong test statistic, at least in the one
sample case, as is illustrated in the following example:

=============
. input x w F H

            x          w          F          H
 1. 1   1  .1  .2
 2. 2   1  .2  .4
 3. 3   1  .3  .6
 4. 4   6  .9  .8
 5. 5   1  1   1
 6. end

. expand w
(5 observations created)

. ksmirnov x = H

One-sample Kolmogorov-Smirnov test against theoretical distribution
          H

Smaller group       D       P-value  Corrected
----------------------------------------------
x:                  0.1000    0.819
Cumulative:        -0.5000    0.007
Combined K-S:       0.5000    0.013      0.006

Note: ties exist in dataset;
     there are 5 unique values out of 10 observations.
=============

Clearly, the largest difference between the empirical and the
theoretical distribution is 0.3,  but -ksmirnov- reports D = 0.5.
Possibly -ksmirnov- computes the right (but conservative) p-value, I
did not check that. Maybe Kirstin could clarify.

Robert is interested in two-sample Kolmogorov-Smirnov tests. I do not
have to offer any solutions for this case. However, note that I will
soon release a package for multinomial goodness-of-fit tests that also
performs exact one-sample Kolmogorov-Smirnov tests based on
combinatorial approaches. I also have some premature Mata code I could
share implementing procedures by Conover (1972) and Wood/Altavela
(1978).

ben

Conover, W. J. (1972). A Kolmogorov Goodness-of-Fit Test for
Discontinuous Distributions. Journal of the American Statistical
Association 67: 591-596.

Wood, Constance L., and Michele M. Altavela (1978). Large-Sample
Results for Kolmogorov-Smirnov Statistics for Discrete Distributions.
Biometrika 65(1): 235-239.


On 6/15/07, Robert �stling <[email protected]> wrote:
I would like to do a K-S test with discrete data using ksmirnov. I have read previous posts on the list (the latest being http://www.stata.com/statalist/archive/2007-05/msg00489.html), but I can't figure out if the problem has been fixed. I understand that the easy solution implies conservative p values. The latest version is from 21 May 2007, but when I use it with discrete data it reports that there are ties in the data. Can I use it nevertheless (being aware that the results are conservative)?

The output currently looks as follows:

. ksmirnov guess, by(chosen)

Two-sample Kolmogorov-Smirnov test for equality of distribution functions:

 Smaller group       D       P-value  Corrected
 ----------------------------------------------
 0:                  0.0340    0.070
 1:                 -0.0261    0.209
 Combined K-S:       0.0340    0.140      0.132

Note: ties exist in combined dataset;
      there are 74 unique values out of 5586 observations.

Does this mean that the problem wasn't fixed and that only a warning message has been added in the new version? ;-) The result is exactly the same as with the old version of ksmirnov.

Robert Ostling (new on the list)

*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2024 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index