Statalist The Stata Listserver


[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

Re: st: ksmirnov


From   kmacdonald@stata.com (Kristin MacDonald, StataCorp)
To   statalist@hsphsun2.harvard.edu
Subject   Re: st: ksmirnov
Date   Fri, 18 May 2007 13:52:07 -0500

On September 13, 2006, Stas Kolenikov <skolenik@gmail.com> brought up an issue
regarding the use of the -ksmirnov- command when applied to discrete data or
data that contains ties

http://www.stata.com/statalist/archive/2006-09/msg00483.html 

Since then, there has been some further discussion by Ben Yann
<ben.jann@gmail.com> and David Airey <david.airey@Vanderbilt.Edu> on this
topic.

As we looked into the methods -ksmirnov- uses to deal with this type of data,
we found that that empirical distribution function does, at first glance, seem
to be computed incorrectly when there are ties in the data.  However, there
are adjustments made later in the code which ensure that correct test
statistic is produced.  In addition, the p-values are computed properly based
on the formulas given in [R] ksmirnov, page 26.

Stas gave an example where the p-value produced by the one sample version of
-ksmirnov- appeared to be incorrect when comparing an expanded dataset to a
variable containing the exact empirical distribution function.  The problem
that arises in this situation is that the theoretical distribution to which we
are comparing the data is a discrete distribution with very few unique values
relative to the total number of observations.  However, the p-value that
-ksmirnov- calculates is intended for the case where we wish to compare data
to a continuous distribution.  Therefore, it is not appropriate in this
situation.  As Ben mentioned, there is literature discussing an extension to
the Kolmogorov-Smirnov test that provides for calculating an exact p-value
when the hypothesized distribution is discrete.  However, at this point, no
exact p-values are implemented in -ksmirnov- for the one sample case,
including those for comparisons with a discrete distribution.

Having said that, in the next update we will adjust -ksmirnov- to issue a note
in the case when the data contain ties.

-- Kristin
kmacdonald@stata.com
*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index