[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

From |
"Nick Cox" <n.j.cox@durham.ac.uk> |

To |
<statalist@hsphsun2.harvard.edu> |

Subject |
st: -sktest- |

Date |
Thu, 23 Sep 2004 19:14:00 +0100 |

-sktest- was mentioned on the list. This raised my interest once again in the question of the utility of these tests for normality (Gaussianity). The main original idea seems to have been that, when sampling from a Gaussian, skewness - 0 ------------ its se and kurtosis - 3 ------------ its se are themselves unit Gaussian. Hence the sum of the squares of these statistics is chi-square with 2 df. However, these are large sample results and the convergence on Gaussianity of the individual statistics can be very slow. -sktest- builds in various adjustments for sample size. In contrast, the user-written -jb6- uses the large sample results regardless. As said earlier, it is difficult to justify the use of -jb6-. I guess it continues to be downloaded from SSC because the buzzwords "Jarque-Bera test" are familiar to some groups and it is not fully realised that the official -sktest- is a much better program. However, my main concern is how these tests behave with real data and how they might help with their analysis. Reaching for the auto data, we can get a condensed display of -sktest- results by foreach v of var price-for { qui sktest `v' if rep78 < . di as txt "`v'{col 20}" /// as res %5.3f r(P_skew) " " /// %5.3f r(P_kurt) " " /// %5.3f r(P_chi2) } price 0.000 0.009 0.000 mpg 0.001 0.081 0.004 rep78 0.833 0.747 0.929 headroom 0.471 0.033 0.082 trunk 0.872 0.039 0.112 weight 0.664 0.013 0.050 length 0.780 0.004 0.023 turn 0.794 0.079 0.192 displacement 0.043 0.201 0.063 gear_ratio 0.309 0.021 0.051 foreign 0.005 0.000 0.000 It is instructive now to cycle through a series of -qnorm- and/or -histogram-s and also to calculate skewness and kurtosis themselves. One interesting variable is -trunk-, which is discussed in [R] sktest, where it is stated that the tails are too thick (the kurtosis is too high). As my old economics teacher used to say, "Even Homer sometimes nods". Looking at graphs, and also at the moments, using a private domain program, shows that this interpretation is backwards: . moments price-for ------------------------------------------------------------- n = 69 | mean SD skewness kurtosis -------------+----------------------------------------------- price | 6146.043 2912.440 1.688 5.032 mpg | 21.290 5.866 0.995 3.997 rep78 | 3.406 0.990 -0.057 2.678 headroom | 3.000 0.853 0.197 2.144 trunk | 13.928 4.343 -0.044 2.159 weight | 3032.029 792.852 0.118 2.073 length | 188.290 22.747 -0.076 2.000 turn | 39.797 4.441 0.071 2.228 displacement | 198.000 93.148 0.581 2.354 gear_ratio | 2.999 0.463 0.279 2.109 foreign | 0.304 0.464 0.850 1.723 ------------------------------------------------------------- -trunk- in fact has _low_ kurtosis -- it is short- or light-tailed -- and the P-value reflects the fact that the test statistic, a sum of squares, is constructed to measure non-normality, and in particular kurtosis differing from 3 in either direction. -sktest- is doing what is designed to do, but in practice one kind of deviation from normality (skewness and/or heavy tails) is much more likely to be problematic than the other (short/light tails). The moral is very simple and perhaps too elementary: these tests can easily be misinterpreted if you don't also look at the data. Nick n.j.cox@durham.ac.uk * * For searches and help try: * http://www.stata.com/support/faqs/res/findit.html * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

- Prev by Date:
**st: RE: swaic for model selection with AIC** - Next by Date:
**st: RE: swaic for model selection with AIC** - Previous by thread:
**st: RE: swaic for model selection with AIC** - Next by thread:
**st: New version of -metareg- on SSC** - Index(es):

© Copyright 1996–2016 StataCorp LP | Terms of use | Privacy | Contact us | What's new | Site index |