Statalist The Stata Listserver

[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

st: implementation of -ksmirnov- ?

From   "Stas Kolenikov" <>
Subject   st: implementation of -ksmirnov- ?
Date   Wed, 13 Sep 2006 14:41:18 -0500

consider the following small fragment:

set obs 20
g z = invnorm(uniform())
sort z
g cdf = _n/20
ksmirnov z = cdf
ksmirnov z = norm(z)
* those should give insignificant differences, as both are true
distributions at this moment

expand 50
* now I made this a discrete distribution with 50 points at each of 20
point masses
ksmirnov z = norm(z)
* this one is rejected: the distribution is not normal anymore
ksmirnov z = cdf
* but so is this one, with the difference between the empirical cdf
and the theoretical
* one being evaluated as 0.05!

I think the issue is that in the code of -ksmirnov- (and it is not too
difficult to locate) there are lines that looks essentially as

sort x
gen cdf = _n/_N

which is not quite appropriate for discrete data. Something like

bysort x (cdf): replace cdf = cdf[_N]

should be added, so that the cdfs do look like Prob[X <= x], and the
above problem would be solved.

Stas Kolenikov
*   For searches and help try:

© Copyright 1996–2015 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index