Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

# st: Re: question concerning normality

 From Maarten Buis <[email protected]> To [email protected] Subject st: Re: question concerning normality Date Wed, 10 Aug 2011 10:43:26 +0200

```On Fri, Aug 5, 2011 at 2:36 PM, Caspar Bijleveld wrote:
> I have seen a view of your responses on the internet concerning several
> Stata tips in statalist. I am currently working on a paper and I come across
> an important problem (which is probably quite easy to tackle) and hopefully
> you are able to help me. My dependent variable is not normally distributed
> (it concerns financial data). I have been advised to take the LN of the
> variable, which should make it more normally distributed, but still the
> Jarque Bera test is just significant ( I have to reject the nul hypothesis
> of normality). I think the non normality is due to a few outliers which can
> not be explained by any events in the past.

These questions should not be sent privately but directly to the
Statalist. See <http://www.stata.com/support/faqs/res/statalist.html#private>
for several reasons why that is the case.

It is a common misunderstanding that the dependent variable should
look like the bell shaped normal distribution. The distribution is
normally distributed, but one of the parameters of that distribution,
the conditional mean, changes from observation to observation. The
result is that the form of the distribution of the dependent variable
can be about anything, and tests for normality of the dependent
variable are completely meaningless as they assume a common set of
parameters for all observations.

What you can look at is the distribution of the residuals, that should
have one mean (0) and one standard deviation (the root mean squared
error)  for all observations. However, I would not rely on statistical
tests. Tests cannot directly test the hypothesis that a variable is
normally (or otherwise) distributed. They need to translate that to a
testable null hypothesis, which means they derive a limited number of
consequences from the assumed normality and test those. As a
consequence these tests can only detect some very specific deviations,
e.g. the Jarque-Bera test only looks that the skewness and the
kurtosis. The way to check for deviations from normality is to look at
graphs, for several useful graphs type in Stata -help pnorm-, -help
qnorm-, and -ssc desc hangroot-.

As to outliers, it is helpful to see them as not a problem but as an
opportunity to strengthen your argument. Consider the description of
this classic analysis:
<http://www.significancemagazine.org/details/magazine/1076383/London-cholera-and-the-blindspot-of-an-epidemiology-theory-.html>

The short version of it is: The controversy was: is cholera coming
from (drinking) water or from the air? The main pattern was that the
occurrence of the disease was highly clustered in certain areas. This
would be consistent with both competing theories. The fact that there
was a waterpump near the center of the high risk area helped support
the water theory, but was hardly conclusive. However, by carefully
examining the outliers John Snow could explain the main outliers,
people living in the high risk area that did not contract cholera.
They either had their own private well or they worked in a brewery and
drank beer. In this case it was the outliers that provide the most
convincing evidence, not the main pattern.

Hope this helps,
Maarten

--------------------------
Maarten L. Buis
Institut fuer Soziologie
Universitaet Tuebingen
Wilhelmstrasse 36
72074 Tuebingen
Germany

http://www.maartenbuis.nl
--------------------------
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
```