Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: sign test output

From   Dirk Enzmann <>
Subject   Re: st: sign test output
Date   Fri, 18 Jan 2013 22:48:56 +0100

In this context, the following working paper by Mantalos might be interesting:

It should be possible to implement his JBCV(k1,k2) procedure in Stata and it would be interesting to see the results of including this test in the simulation.


Date: Thu, 17 Jan 2013 13:14:39 +0100
From: Maarten Buis<>
Subject: Re: st: sign test output

On Thu, Jan 17, 2013 at 11:21 AM, Nahla Betelmal wrote:
>  from my readings in statistics , I know that in order to decide
>  whether to use parametric or non-parametric tests, the data normality
>  distribution should be checked first.
>    Shapiro-Wilk is used to test normality, when the number of
>  observations is less than 30. Otherwise, we should use
>  Kolmogorov-Smirnov for large sample (as in my sample).
Unfortunately that is incorrect. Normality tests need huge samples
before the p-value means what it is supposed to mean. An analogy I
have heard in a different context, but which applies to this situation
very well is: to go out to sea in a row boat to check whether the sea
is safe for the QE II. Using a normality test with only 346
observations is not a good idea.

Nick and I discussed the issue of the performance of tests for
Gaussianity recently on Statalist:

The bottom line was: you need at least somewhere between 10,000 and a
100,000 observations before the tests we discussed (Jarque-Bera and
Doornik-Hansen) perform somewhat acceptably, but in such large
datasets you need to worry whether deviations from Gaussianity that
are statistically significant are also substantively significant.

I have addepted the simulation from the discussion above for the
Kolmogorov-Smirnov test. It shows that the Kolmogorov-Smirnov test
does not perform acceptably for any of these sample sizes.

*------------------- begin simulation -------------------
clear all

program define sim, rclass
     drop _all
     set obs `=1e5'
     gen double x = rnormal()
     forvalues i = 2/5 {
         sum x in 1/`=1e`i''
         ksmirnov x = normal((x-r(mean))/r(sd))
         return scalar p`i' = r(p)
         return scalar p_cor`i' = r(p_cor)

simulate p2p=r(p2) p2c=r(p_cor2) ///
          p3p=r(p3) p3c=r(p_cor3) ///
          p4p=r(p4) p4c=r(p_cor4) ///
          p5p=r(p5) p5c=r(p_cor5) ///
          , reps(2e4): sim

gen id = _n

reshape long p2 p3 p4 p5, i(id) j(dist) string

label var p2 "N=100"
label var p3 "N=1,000"
label var p4 "N=10,000"
label var p5 "N=100,000"

gen byte distr = cond(dist=="p",1,2)
label define distr 1 "p-value" ///
                    2 "corrected p-value", replace
label value distr distr

simpplot p?, by(distr) scheme(s2color) legend(cols(4))
*-------------------- end simulation --------------------
(For more on examples I sent to the Statalist see:  )

This simulation needs the -simpplot- package in order to run. This can
be downloaded by typing in Stata -ssc install simpplot-.

Dr. Dirk Enzmann
Institute of Criminal Sciences
Dept. of Criminology
Rothenbaumchaussee 33
D-20148 Hamburg

phone: +49-(0)40-42838.7498 (office)
       +49-(0)40-42838.4591 (Mrs Billon)
fax:   +49-(0)40-42838.2344
*   For searches and help try:

© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index