Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: Jarque-Bera test

From   Maarten Buis <>
Subject   Re: st: Jarque-Bera test
Date   Thu, 27 Sep 2012 15:00:57 +0200

On Thu, Sep 27, 2012 at 3:44 AM, Nick Cox  wrote:
> The essence of the matter is that Jarque-Bera uses asymptotic results
> regardless of sample size for a problem in which convergence to those
> results is very slow. This approach is decades out of date and I am
> surprised that StataCorp support the test without a warning. The
> Doornik-Hansen test, for example, looks much more satisfactory.

I took up this challenge and did a simulation comparing the
performance of the Jarque-Bera test with the Doornik-Hansen test. In
particular I focused on whether the p-value follow a uniform
distribution, i.e. whether the nominal rejection rates correspond with
the proportion of simulations in which the test was rejected at those
nominal rates. In essence both tests perform badly at sample sizes of
a 100 and a 1,000. As Nick suggested, the Jarque-Bera test's
perfomance is more awful than the performance of the Doornik-Hansen
test, but for both tests my conclusion would be that a 1,000
observations is just not enough for either test. At 10,000 and 100,000
observations both tests seem to perform acceptable. However, at such
large sample sizes you need to worry about whether a rejection of the
null-hypothesis actually represents a substantively meaningful
deviation from the normal/Gaussian distribution.

So the bottom line is: at small sample sizes graphs are the only
reliable way of judging whether a variable comes from a
normal/Gaussian distribution because tests just don't perform well
enough. At large sample sizes graphs are still the only reliable way
of judging whether a variable comes from a normal/Gaussian
distribution because in large sample sizes tests will pick up
substantively meaningless deviations from the null-hypothesis.

*------------------- begin simulation -------------------
clear all

program define sim, rclass
	drop _all
	set obs `=1e5'
	gen x = rnormal()
	tempname jb jbp
	forvalues i = 2/5 {
		sum x in 1/`=1e`i'', detail
		scalar `jb' = (r(N)/6) * ///
		       (r(skewness)^2 + 1/4*(r(kurtosis) - 3)^2)
		scalar `jbp' = chi2tail(2,`jb')
		return scalar jb`i' = `jb'
		return scalar jbp`i' = `jbp'
		mvtest norm x in 1/`=1e`i''
		return scalar dh`i' = r(chi2_dh)
		return scalar dhp`i' = r(p_dh)

simulate jb2=r(jb2) jbp2=r(jbp2) ///
         jb3=r(jb3) jbp3=r(jbp3) ///
         jb4=r(jb4) jbp4=r(jbp4) ///
         jb5=r(jb5) jbp5=r(jbp5) ///
		 dh2=r(dh2) dhp2=r(dhp2) ///
         dh3=r(dh3) dhp3=r(dhp3) ///
         dh4=r(dh4) dhp4=r(dhp4) ///
         dh5=r(dh5) dhp5=r(dhp5) ///
         , reps(2e4): sim

rename jbp2 p2jb
rename jbp3 p3jb
rename jbp4 p4jb
rename jbp5 p5jb
rename dhp2 p2dh
rename dhp3 p3dh
rename dhp4 p4dh
rename dhp5 p5dh

gen id = _n

reshape long p2 p3 p4 p5, i(id) j(dist) string

label var p2 "N=100"
label var p3 "N=1,000"
label var p4 "N=10,000"
label var p5 "N=100,000"

encode dist, gen(distr)
label define distr 2 "Jarque-Bera" ///
                   1 "Doornik-Hansen", replace
label value distr distr

simpplot p?, by(distr) scheme(s2color) legend(cols(4))
*-------------------- end simulation --------------------
(For more on examples I sent to the Statalist see: )

This simulation requires the -simpplot- package available at SSC and
described here: <>

-- Maarten

Maarten L. Buis
Reichpietschufer 50
10785 Berlin
*   For searches and help try:

© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index