Statalist The Stata Listserver


[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

st: implementation of -ksmirnov- ?


From   "Stas Kolenikov" <[email protected]>
To   [email protected]
Subject   st: implementation of -ksmirnov- ?
Date   Wed, 13 Sep 2006 14:41:18 -0500

consider the following small fragment:

set obs 20
g z = invnorm(uniform())
sort z
g cdf = _n/20
ksmirnov z = cdf
ksmirnov z = norm(z)
* those should give insignificant differences, as both are true
distributions at this moment

expand 50
* now I made this a discrete distribution with 50 points at each of 20
point masses
ksmirnov z = norm(z)
* this one is rejected: the distribution is not normal anymore
ksmirnov z = cdf
* but so is this one, with the difference between the empirical cdf
and the theoretical
* one being evaluated as 0.05!

I think the issue is that in the code of -ksmirnov- (and it is not too
difficult to locate) there are lines that looks essentially as

sort x
gen cdf = _n/_N

which is not quite appropriate for discrete data. Something like

bysort x (cdf): replace cdf = cdf[_N]

should be added, so that the cdfs do look like Prob[X <= x], and the
above problem would be solved.

--
Stas Kolenikov
http://stas.kolenikov.name
*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2024 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index