 Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

# Re: st: Jarque-Bera test

 From Maarten Buis To statalist@hsphsun2.harvard.edu Subject Re: st: Jarque-Bera test Date Thu, 27 Sep 2012 15:00:57 +0200

```On Thu, Sep 27, 2012 at 3:44 AM, Nick Cox  wrote:
> The essence of the matter is that Jarque-Bera uses asymptotic results
> regardless of sample size for a problem in which convergence to those
> results is very slow. This approach is decades out of date and I am
> surprised that StataCorp support the test without a warning. The
> Doornik-Hansen test, for example, looks much more satisfactory.

I took up this challenge and did a simulation comparing the
performance of the Jarque-Bera test with the Doornik-Hansen test. In
particular I focused on whether the p-value follow a uniform
distribution, i.e. whether the nominal rejection rates correspond with
the proportion of simulations in which the test was rejected at those
nominal rates. In essence both tests perform badly at sample sizes of
a 100 and a 1,000. As Nick suggested, the Jarque-Bera test's
perfomance is more awful than the performance of the Doornik-Hansen
test, but for both tests my conclusion would be that a 1,000
observations is just not enough for either test. At 10,000 and 100,000
observations both tests seem to perform acceptable. However, at such
large sample sizes you need to worry about whether a rejection of the
null-hypothesis actually represents a substantively meaningful
deviation from the normal/Gaussian distribution.

So the bottom line is: at small sample sizes graphs are the only
reliable way of judging whether a variable comes from a
normal/Gaussian distribution because tests just don't perform well
enough. At large sample sizes graphs are still the only reliable way
of judging whether a variable comes from a normal/Gaussian
distribution because in large sample sizes tests will pick up
substantively meaningless deviations from the null-hypothesis.

*------------------- begin simulation -------------------
clear all

program define sim, rclass
drop _all
set obs `=1e5'
gen x = rnormal()
tempname jb jbp
forvalues i = 2/5 {
sum x in 1/`=1e`i'', detail
scalar `jb' = (r(N)/6) * ///
(r(skewness)^2 + 1/4*(r(kurtosis) - 3)^2)
scalar `jbp' = chi2tail(2,`jb')
return scalar jb`i' = `jb'
return scalar jbp`i' = `jbp'

mvtest norm x in 1/`=1e`i''
return scalar dh`i' = r(chi2_dh)
return scalar dhp`i' = r(p_dh)

}
end

simulate jb2=r(jb2) jbp2=r(jbp2) ///
jb3=r(jb3) jbp3=r(jbp3) ///
jb4=r(jb4) jbp4=r(jbp4) ///
jb5=r(jb5) jbp5=r(jbp5) ///
dh2=r(dh2) dhp2=r(dhp2) ///
dh3=r(dh3) dhp3=r(dhp3) ///
dh4=r(dh4) dhp4=r(dhp4) ///
dh5=r(dh5) dhp5=r(dhp5) ///
, reps(2e4): sim

rename jbp2 p2jb
rename jbp3 p3jb
rename jbp4 p4jb
rename jbp5 p5jb
rename dhp2 p2dh
rename dhp3 p3dh
rename dhp4 p4dh
rename dhp5 p5dh

gen id = _n

reshape long p2 p3 p4 p5, i(id) j(dist) string

label var p2 "N=100"
label var p3 "N=1,000"
label var p4 "N=10,000"
label var p5 "N=100,000"

encode dist, gen(distr)
label define distr 2 "Jarque-Bera" ///
1 "Doornik-Hansen", replace
label value distr distr

simpplot p?, by(distr) scheme(s2color) legend(cols(4))
*-------------------- end simulation --------------------
(For more on examples I sent to the Statalist see:
http://www.maartenbuis.nl/example_faq )

This simulation requires the -simpplot- package available at SSC and
described here: <http://www.maartenbuis.nl/software/simpplot.html>

-- Maarten

---------------------------------
Maarten L. Buis
WZB
Reichpietschufer 50
10785 Berlin
Germany

http://www.maartenbuis.nl
---------------------------------
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/
```