Stata: Data Analysis and Statistical Software

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: sign test output

From	Dirk Enzmann <[email protected]>
To	[email protected]
Subject	Re: st: sign test output
Date	Fri, 18 Jan 2013 22:48:56 +0100

In this context, the following working paper by Mantalos might beinteresting:


http://hj.se/download/18.3bf8114412e804c78638000150/1299244445855/WP2010-8.pdf

It should be possible to implement his JBCV(k1,k2) procedure in Stataand it would be interesting to see the results of including this test inthe simulation.


Dirk

Date: Thu, 17 Jan 2013 13:14:39 +0100
From: Maarten Buis<[email protected]>
Subject: Re: st: sign test output

On Thu, Jan 17, 2013 at 11:21 AM, Nahla Betelmal wrote:

>  from my readings in statistics , I know that in order to decide
>  whether to use parametric or non-parametric tests, the data normality
>  distribution should be checked first.
>
>    Shapiro-Wilk is used to test normality, when the number of
>  observations is less than 30. Otherwise, we should use
>  Kolmogorov-Smirnov for large sample (as in my sample).

Unfortunately that is incorrect. Normality tests need huge samples
before the p-value means what it is supposed to mean. An analogy I
have heard in a different context, but which applies to this situation
very well is: to go out to sea in a row boat to check whether the sea
is safe for the QE II. Using a normality test with only 346
observations is not a good idea.

Nick and I discussed the issue of the performance of tests for
Gaussianity recently on Statalist:
http://www.stata.com/statalist/archive/2012-09/msg01040.html
http://www.stata.com/statalist/archive/2012-09/msg01013.html

The bottom line was: you need at least somewhere between 10,000 and a
100,000 observations before the tests we discussed (Jarque-Bera and
Doornik-Hansen) perform somewhat acceptably, but in such large
datasets you need to worry whether deviations from Gaussianity that
are statistically significant are also substantively significant.

I have addepted the simulation from the discussion above for the
Kolmogorov-Smirnov test. It shows that the Kolmogorov-Smirnov test
does not perform acceptably for any of these sample sizes.

*------------------- begin simulation -------------------
clear all

program define sim, rclass
     drop _all
     set obs `=1e5'
     gen double x = rnormal()
     forvalues i = 2/5 {
         sum x in 1/`=1e`i''
         ksmirnov x = normal((x-r(mean))/r(sd))
         return scalar p`i' = r(p)
         return scalar p_cor`i' = r(p_cor)
     }
end

simulate p2p=r(p2) p2c=r(p_cor2) ///
          p3p=r(p3) p3c=r(p_cor3) ///
          p4p=r(p4) p4c=r(p_cor4) ///
          p5p=r(p5) p5c=r(p_cor5) ///
          , reps(2e4): sim

gen id = _n

reshape long p2 p3 p4 p5, i(id) j(dist) string

label var p2 "N=100"
label var p3 "N=1,000"
label var p4 "N=10,000"
label var p5 "N=100,000"

gen byte distr = cond(dist=="p",1,2)
label define distr 1 "p-value" ///
                    2 "corrected p-value", replace
label value distr distr

simpplot p?, by(distr) scheme(s2color) legend(cols(4))
*-------------------- end simulation --------------------
(For more on examples I sent to the Statalist see:
http://www.maartenbuis.nl/example_faq  )

This simulation needs the -simpplot- package in order to run. This can
be downloaded by typing in Stata -ssc install simpplot-.


--
========================================
Dr. Dirk Enzmann
Institute of Criminal Sciences
Dept. of Criminology
Rothenbaumchaussee 33
D-20148 Hamburg
Germany

phone: +49-(0)40-42838.7498 (office)
       +49-(0)40-42838.4591 (Mrs Billon)
fax:   +49-(0)40-42838.2344
email: [email protected]
http://www2.jura.uni-hamburg.de/instkrim/kriminologie/Mitarbeiter/Enzmann/Enzmann.html
========================================
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/

Follow-Ups:
- Re: st: sign test output
  - From: Maarten Buis <[email protected]>

Prev by Date: Re: st: text editor for mac 10.5.8
Next by Date: st: Strange Behaviour When Selecting Levels For Factor Variables In Regression With i#
Previous by thread: Re: st: sign test output
Next by thread: Re: st: sign test output
Index(es):
- Date
- Thread