[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

st: -sktest-

From	"Nick Cox" <[email protected]>
To	<[email protected]>
Subject	st: -sktest-
Date	Thu, 23 Sep 2004 19:14:00 +0100

-sktest- was mentioned on the list. 

This raised my interest once again in the
question of the utility of these tests 
for normality (Gaussianity). 

The main original idea seems to have 
been that, when sampling from a Gaussian, 

	skewness - 0 
	------------
	    its se

and 

	kurtosis - 3 
	------------
	    its se 

are themselves unit Gaussian. Hence 
the sum of the squares of these statistics
is chi-square with 2 df. However, these
are large sample results and the 
convergence on Gaussianity of the 
individual statistics can be very slow. 

-sktest- builds in various adjustments 
for sample size. In contrast, the 
user-written -jb6- uses the large 
sample results regardless. As said earlier, 
it is difficult to justify the use of -jb6-. I guess
it continues to be downloaded from SSC 
because the buzzwords "Jarque-Bera test"
are familiar to some groups and it is 
not fully realised that the official -sktest- 
is a much better program. 

However, my main concern is how 
these tests behave with real data and how 
they might help with their analysis.
Reaching for the auto data, we can 
get a condensed display of -sktest- 
results by 

foreach v of var price-for { 
	qui sktest `v' if rep78 < . 
	di as txt "`v'{col 20}"  ///
         as res  %5.3f r(P_skew) "  "  ///
	           %5.3f r(P_kurt) "  "  ///
	           %5.3f  r(P_chi2) 
}

price              0.000  0.009  0.000
mpg                0.001  0.081  0.004
rep78              0.833  0.747  0.929
headroom           0.471  0.033  0.082
trunk              0.872  0.039  0.112
weight             0.664  0.013  0.050
length             0.780  0.004  0.023
turn               0.794  0.079  0.192
displacement       0.043  0.201  0.063
gear_ratio         0.309  0.021  0.051
foreign            0.005  0.000  0.000

It is instructive now to cycle through 
a series of -qnorm- and/or -histogram-s 
and also to calculate skewness and kurtosis
themselves. 

One interesting variable is -trunk-, 
which is discussed in [R] sktest, 
where it is stated that the tails 
are too thick (the kurtosis is too 
high). As my old economics teacher 
used to say, "Even Homer sometimes nods". 
Looking at graphs, and also at the moments, 
using a private domain program, shows that 
this interpretation is backwards: 

. moments price-for

-------------------------------------------------------------
      n = 69 |       mean          SD    skewness    kurtosis
-------------+-----------------------------------------------
       price |   6146.043    2912.440       1.688       5.032
         mpg |     21.290       5.866       0.995       3.997
       rep78 |      3.406       0.990      -0.057       2.678
    headroom |      3.000       0.853       0.197       2.144
       trunk |     13.928       4.343      -0.044       2.159
      weight |   3032.029     792.852       0.118       2.073
      length |    188.290      22.747      -0.076       2.000
        turn |     39.797       4.441       0.071       2.228
displacement |    198.000      93.148       0.581       2.354
  gear_ratio |      2.999       0.463       0.279       2.109
     foreign |      0.304       0.464       0.850       1.723
-------------------------------------------------------------

-trunk- in fact has _low_ kurtosis -- it is short- or light-tailed -- 
and the P-value reflects the fact that the test statistic,  
a sum of squares, is constructed to measure 
non-normality, and in particular kurtosis differing from 3 in either 
direction. 

-sktest- is doing what is designed to do, but in practice one 
kind of deviation from normality (skewness and/or heavy 
tails) is much more likely to be problematic than the 
other (short/light tails). 

The moral is very simple and perhaps too elementary: 
these tests can easily be misinterpreted if you don't also 
look at the data. 

Nick 
[email protected] 

*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

Prev by Date: st: RE: swaic for model selection with AIC
Next by Date: st: RE: swaic for model selection with AIC
Previous by thread: st: RE: swaic for model selection with AIC
Next by thread: st: New version of -metareg- on SSC
Index(es):
- Date
- Thread