Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

# Re: st: sign test output

 From Maarten Buis To statalist@hsphsun2.harvard.edu Subject Re: st: sign test output Date Thu, 17 Jan 2013 13:14:39 +0100

```On Thu, Jan 17, 2013 at 11:21 AM, Nahla Betelmal wrote:
> from my readings in statistics , I know that in order to decide
> whether to use parametric or non-parametric tests, the data normality
> distribution should be checked first.
>
>  Shapiro-Wilk is used to test normality, when the number of
> observations is less than 30. Otherwise, we should use
> Kolmogorov-Smirnov for large sample (as in my sample).

Unfortunately that is incorrect. Normality tests need huge samples
before the p-value means what it is supposed to mean. An analogy I
have heard in a different context, but which applies to this situation
very well is: to go out to sea in a row boat to check whether the sea
is safe for the QE II. Using a normality test with only 346
observations is not a good idea.

Nick and I discussed the issue of the performance of tests for
Gaussianity recently on Statalist:
http://www.stata.com/statalist/archive/2012-09/msg01040.html
http://www.stata.com/statalist/archive/2012-09/msg01013.html

The bottom line was: you need at least somewhere between 10,000 and a
100,000 observations before the tests we discussed (Jarque-Bera and
Doornik-Hansen) perform somewhat acceptably, but in such large
datasets you need to worry whether deviations from Gaussianity that
are statistically significant are also substantively significant.

I have addepted the simulation from the discussion above for the
Kolmogorov-Smirnov test. It shows that the Kolmogorov-Smirnov test
does not perform acceptably for any of these sample sizes.

*------------------- begin simulation -------------------
clear all

program define sim, rclass
drop _all
set obs `=1e5'
gen double x = rnormal()
forvalues i = 2/5 {
sum x in 1/`=1e`i''
ksmirnov x = normal((x-r(mean))/r(sd))
return scalar p`i' = r(p)
return scalar p_cor`i' = r(p_cor)
}
end

simulate p2p=r(p2) p2c=r(p_cor2) ///
p3p=r(p3) p3c=r(p_cor3) ///
p4p=r(p4) p4c=r(p_cor4) ///
p5p=r(p5) p5c=r(p_cor5) ///
, reps(2e4): sim

gen id = _n

reshape long p2 p3 p4 p5, i(id) j(dist) string

label var p2 "N=100"
label var p3 "N=1,000"
label var p4 "N=10,000"
label var p5 "N=100,000"

gen byte distr = cond(dist=="p",1,2)
label define distr 1 "p-value" ///
2 "corrected p-value", replace
label value distr distr

simpplot p?, by(distr) scheme(s2color) legend(cols(4))
*-------------------- end simulation --------------------
(For more on examples I sent to the Statalist see:
http://www.maartenbuis.nl/example_faq )

This simulation needs the -simpplot- package in order to run. This can

> So, when the test accepts the null (normality),

A statistical tests never accepts a null hypothesis; it can only fail
to reject the null hypothesis. This may sound pedantic, but the
difference is important: In the case of non-significance you don't
have evidence that the null hypothesis is wrong, but an absence of
evidence is not the same thing as evidence of absence.

> Also,  for the comment about robust, I meant exactly what said (I used
> the robust term loosely)

It is probably best to avoid the term robust, since it has a very
specific meaning in statistics. Actually, to make it more confusing,
it has multiple specific meanings.

-- Maarten

---------------------------------
Maarten L. Buis
WZB
Reichpietschufer 50
10785 Berlin
Germany

http://www.maartenbuis.nl
---------------------------------
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/
```