This is a fairly common question on Statalist.
Missings are irrelevant to -sktest-, and
are just ignored, so that is no problem. However,
the fact that you got missings may or may not
indicate some much deeper problem, but that's
for you to consider.
-sktest- is here rejecting a null hypothesis
of normality. With your sample sizes, this is
totally unsurprising. You are being told that
your sample is large enough to distinguish
between "genuine" non-normality and "apparent"
non-normality that is just the sampling
fluctuation that would occur if the underlying distribution
really were normal. However, with your
sample sizes, the kind of non-normality at
which -sktest- squawks would not necessarily
trouble any data analyst with experience.
It is salutary to cycle through the numeric
variables in Stata's auto data and look at -sktest-
results. Here n is much smaller than yours at n = 74
but -sktest- often reports rejection on what
graphical analysis will reveal as an unproblematic
distribution. For example, -sktest- may reject if a
variable is shorter-tailed than normal.
It may reject if a variable is somewhat
irregular in distribution, but otherwise
not problematic. In a word, it is typically
over-sensitive for the practical problem.
Any test in this area still leaves the question
of measuring, or more generally assessing,
the kind of non-normality you have and of
deciding whether non-normality is really a
problem for what you are doing. A direct
calculation of moments (or alternative
measures such as L-moments) is sometimes
helpful here.
The issue of -sktest- versus a Jarque-Bera
test is also secondary. Jarque-Bera typically
seems to mean using asymptotic sampling distributions
for skewness and kurtosis for a problem
in which they are often a poor approximation.
(Also, Jarque and Bera just reinvented a very old
test. Why they got credit for that is mysterious,
except on the hypothesis that people have no
time for proper reading.) -sktest- is, more or less,
Jarque-Bera done better with adjustments for sample size.
My guess would be that it would make no difference
in your case.
Graphical examination of your residuals
with -qnorm- will teach you far more about
their (non-)normality than a -sktest-. The
only practical reason for using -sktest-
is whenever that you are obliged to use it
by instruction from someone in power over you,
namely an advisor, boss, reviewer or journal editor.
Another detail is that -sktest- does not know
that your variable is a residual and makes no
adjustment for that fact. A wild guess is that
this is just a purist issue in your case.
Nick
[email protected]
M. Haider Hussain
> Sorry for such a novice-level question.
>
> I ran an ols regression with 15 estimators and 14831 observations. In
> this process, 437 missing values were generated. Then I tested
> normality of the residual using sktest and it returned following
> output.
>
> Variable | Pr(Skewness) Pr(Kurtosis) adj chi2(2) Prob>chi2
> --------------------------------------------------------------
> -------------------------------
> ewhe | 0.000 0.000 .
> .
>
> whereas, sktest with noadjust option returned the following output
>
> Variable | Pr(Skewness) Pr(Kurtosis) adj chi2(2) Prob>chi2
> --------------------------------------------------------------
> -------------------------------
> ewhe | 0.000 0.000 3693.33
> 0.0000
>
>
> Where're the statistics of chi2 in the first instance? Does it mean
> that sktest (without no adjust) is sensitive to the missing values?
> Can I use jb test with 14000+ observations? If not than what other
> "quantitative" tests are available?
> (Or am I misinterpreting something?)
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/