Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: sign test output

From   Maarten Buis <>
Subject   Re: st: sign test output
Date   Thu, 17 Jan 2013 13:14:39 +0100

On Thu, Jan 17, 2013 at 11:21 AM, Nahla Betelmal wrote:
> from my readings in statistics , I know that in order to decide
> whether to use parametric or non-parametric tests, the data normality
> distribution should be checked first.
>  Shapiro-Wilk is used to test normality, when the number of
> observations is less than 30. Otherwise, we should use
> Kolmogorov-Smirnov for large sample (as in my sample).

Unfortunately that is incorrect. Normality tests need huge samples
before the p-value means what it is supposed to mean. An analogy I
have heard in a different context, but which applies to this situation
very well is: to go out to sea in a row boat to check whether the sea
is safe for the QE II. Using a normality test with only 346
observations is not a good idea.

Nick and I discussed the issue of the performance of tests for
Gaussianity recently on Statalist:

The bottom line was: you need at least somewhere between 10,000 and a
100,000 observations before the tests we discussed (Jarque-Bera and
Doornik-Hansen) perform somewhat acceptably, but in such large
datasets you need to worry whether deviations from Gaussianity that
are statistically significant are also substantively significant.

I have addepted the simulation from the discussion above for the
Kolmogorov-Smirnov test. It shows that the Kolmogorov-Smirnov test
does not perform acceptably for any of these sample sizes.

*------------------- begin simulation -------------------
clear all

program define sim, rclass
    drop _all
    set obs `=1e5'
    gen double x = rnormal()
    forvalues i = 2/5 {
        sum x in 1/`=1e`i''
        ksmirnov x = normal((x-r(mean))/r(sd))
        return scalar p`i' = r(p)
        return scalar p_cor`i' = r(p_cor)

simulate p2p=r(p2) p2c=r(p_cor2) ///
         p3p=r(p3) p3c=r(p_cor3) ///
         p4p=r(p4) p4c=r(p_cor4) ///
         p5p=r(p5) p5c=r(p_cor5) ///
         , reps(2e4): sim

gen id = _n

reshape long p2 p3 p4 p5, i(id) j(dist) string

label var p2 "N=100"
label var p3 "N=1,000"
label var p4 "N=10,000"
label var p5 "N=100,000"

gen byte distr = cond(dist=="p",1,2)
label define distr 1 "p-value" ///
                   2 "corrected p-value", replace
label value distr distr

simpplot p?, by(distr) scheme(s2color) legend(cols(4))
*-------------------- end simulation --------------------
(For more on examples I sent to the Statalist see: )

This simulation needs the -simpplot- package in order to run. This can
be downloaded by typing in Stata -ssc install simpplot-.

> So, when the test accepts the null (normality),

A statistical tests never accepts a null hypothesis; it can only fail
to reject the null hypothesis. This may sound pedantic, but the
difference is important: In the case of non-significance you don't
have evidence that the null hypothesis is wrong, but an absence of
evidence is not the same thing as evidence of absence.

> Also,  for the comment about robust, I meant exactly what said (I used
> the robust term loosely)

It is probably best to avoid the term robust, since it has a very
specific meaning in statistics. Actually, to make it more confusing,
it has multiple specific meanings.

-- Maarten

Maarten L. Buis
Reichpietschufer 50
10785 Berlin
*   For searches and help try:

© Copyright 1996–2017 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index