consider the following small fragment:
set obs 20
g z = invnorm(uniform())
sort z
g cdf = _n/20
ksmirnov z = cdf
ksmirnov z = norm(z)
* those should give insignificant differences, as both are true
distributions at this moment
expand 50
* now I made this a discrete distribution with 50 points at each of 20
point masses
ksmirnov z = norm(z)
* this one is rejected: the distribution is not normal anymore
ksmirnov z = cdf
* but so is this one, with the difference between the empirical cdf
and the theoretical
* one being evaluated as 0.05!
I think the issue is that in the code of -ksmirnov- (and it is not too
difficult to locate) there are lines that looks essentially as
sort x
gen cdf = _n/_N
which is not quite appropriate for discrete data. Something like
bysort x (cdf): replace cdf = cdf[_N]
should be added, so that the cdfs do look like Prob[X <= x], and the
above problem would be solved.
--
Stas Kolenikov
http://stas.kolenikov.name
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/