Statalist The Stata Listserver


[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

Re: st: Ksmirnov discrete data (again)


From   "Ben Jann" <[email protected]>
To   [email protected]
Subject   Re: st: Ksmirnov discrete data (again)
Date   Mon, 18 Jun 2007 10:11:32 +0200

Ah, now I see. Many thanks for the clarifications, Kirstin.

In the case of truly discrete data, however, the formula for D does
not make much sense to me since F(x) is a step function (what would
x-e be?). Am I right that for discrete data one would use

D = max | S(x) - F(x) |

to compute the Kolmogorov-Smirnov statistic (as in, e.g., Horn 1997 or
Wood/Altavela
1978)?

ben

Horn, Susan Dadakis (1977). Goodness-of-Fit Tests for Discrete Data: A
Review and an Application to a Health Impairment Scale. Biometrics
33(1): 237-247.

Wood, Constance L., and Michele M. Altavela (1978). Large-Sample
Results for Kolmogorov-Smirnov Statistics for Discrete Distributions.
Biometrika 65(1): 235-239.


On 6/16/07, Kristin MacDonald, StataCorp <[email protected]> wrote:
Robert Ostling asked about using -ksmirnov- with discrete data when performing
a two-sample Kolmogorov-Smirnov test.  Ben Jann <[email protected]> also
commented on performing the one-sample Kolmogorov-Smirnov test with discrete
data.

The methodologies used by -ksmirnov- for both the one and two-sample tests
were derived for data from continuous distributions.

Ben referenced two articles that discuss a way to perform the a one-sample
Kolmogorov-Smirnov test when you are interested in comparing data to a
discrete theoretical distribution.  When making a comparison of this type, the
test statistic should be computed using the method Ben describes as opposed to
the method that -ksmirnov- uses.  Currently, there is not a command that
implements this test, although this is something we are looking into adding.

There has also been some discussion regarding the use of the -ksmirnov-
command when ties exist in the data.  Theoretically, no ties should exist when
data is sampled from a continuous distribution, but, in practice, this is not
necessarily true.  The test statistic that is produced by -ksmirnov- is still
correct when ties exist in a dataset that we wish to compare to a continuous
theoretical distribution.  However, if there are a large number of ties, the
approximate p-value that is reported may not be appropriate.  In the latest
update, a note was added to -ksmirnov- to inform the user of the number of
ties that exist in his dataset.

Gibbons and Chakraborti (2003, 121) give the following formula for the test
statistic D for the one-sample Kolmogorov-Smirnov test

        D = sup|S(x) - F(X) = max[|S(x) - F(x)|, |S(x-e) - F(x)|]

where e is a small positive number.  They also mention that it applies even in
the case when ties are present.

Using the example that Ben gave, this would be as follows

        x       S(x)    F(x)    S(x)-F(x)       S(x-e)-F(x)
        1       .1      .2      -.1             -.2
        2       .2      .4      -.2             -.3
        3       .3      .6      -.3             -.4
        4       .9      .8      .1              -.5
        4       .9      .8      .1              -.5
        4       .9      .8      .1              -.5
        4       .9      .8      .1              -.5
        4       .9      .8      .1              -.5
        4       .9      .8      .1              -.5
        5       1       1       0               -.1

Therefore, D = .5.  This is equivalent to the result that is reported by
-ksmirnov-.  However, Ben's data was intended to be compared to a discrete
distribution, so a test for discrete data would be more suitable.

Gibbons, J. D., and S. Chakraborti.  Nonparametric Statistical Inference.  4th
ed.  New York: Marcel Dekker, Inc.

--Kristin
[email protected]
*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2024 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index