Stata: Data Analysis and Statistical Software

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: Re: question concerning normality

From	Nick Cox <[email protected]>
To	[email protected]
Subject	Re: st: Re: question concerning normality
Date	Wed, 10 Aug 2011 10:00:06 +0100

I agree with Maarten on his stance both on private emails and on the
normality assumption.

An easy counterexample is to suppose that x is uniformly distributed
and y a perfect linear function of x. Then y is also uniformly
distributed. So it would presumably fail any test for normality. But
to argue that regression of y on x is invalid because the response is
not normally distributed would be absurd.

That said, it remains quite likely that working with logarithms of
financial data is a good idea, but the information in the posting is
insufficient to say much more.

On Wed, Aug 10, 2011 at 9:43 AM, Maarten Buis <[email protected]> wrote:
> On Fri, Aug 5, 2011 at 2:36 PM, Caspar Bijleveld wrote:
>> I have seen a view of your responses on the internet concerning several
>> Stata tips in statalist. I am currently working on a paper and I come across
>> an important problem (which is probably quite easy to tackle) and hopefully
>> you are able to help me. My dependent variable is not normally distributed
>> (it concerns financial data). I have been advised to take the LN of the
>> variable, which should make it more normally distributed, but still the
>> Jarque Bera test is just significant ( I have to reject the nul hypothesis
>> of normality). I think the non normality is due to a few outliers which can
>> not be explained by any events in the past.
>
> These questions should not be sent privately but directly to the
> Statalist. See <http://www.stata.com/support/faqs/res/statalist.html#private>
> for several reasons why that is the case.
>
> It is a common misunderstanding that the dependent variable should
> look like the bell shaped normal distribution. The distribution is
> normally distributed, but one of the parameters of that distribution,
> the conditional mean, changes from observation to observation. The
> result is that the form of the distribution of the dependent variable
> can be about anything, and tests for normality of the dependent
> variable are completely meaningless as they assume a common set of
> parameters for all observations.
>
> What you can look at is the distribution of the residuals, that should
> have one mean (0) and one standard deviation (the root mean squared
> error)  for all observations. However, I would not rely on statistical
> tests. Tests cannot directly test the hypothesis that a variable is
> normally (or otherwise) distributed. They need to translate that to a
> testable null hypothesis, which means they derive a limited number of
> consequences from the assumed normality and test those. As a
> consequence these tests can only detect some very specific deviations,
> e.g. the Jarque-Bera test only looks that the skewness and the
> kurtosis. The way to check for deviations from normality is to look at
> graphs, for several useful graphs type in Stata -help pnorm-, -help
> qnorm-, and -ssc desc hangroot-.
>
> As to outliers, it is helpful to see them as not a problem but as an
> opportunity to strengthen your argument. Consider the description of
> this classic analysis:
> <http://www.significancemagazine.org/details/magazine/1076383/London-cholera-and-the-blindspot-of-an-epidemiology-theory-.html>
>
> The short version of it is: The controversy was: is cholera coming
> from (drinking) water or from the air? The main pattern was that the
> occurrence of the disease was highly clustered in certain areas. This
> would be consistent with both competing theories. The fact that there
> was a waterpump near the center of the high risk area helped support
> the water theory, but was hardly conclusive. However, by carefully
> examining the outliers John Snow could explain the main outliers,
> people living in the high risk area that did not contract cholera.
> They either had their own private well or they worked in a brewery and
> drank beer. In this case it was the outliers that provide the most
> convincing evidence, not the main pattern.
>

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

Follow-Ups:
- Re: st: Re: question concerning normality
  - From: Austin Nichols <[email protected]>

References:
- st: Re: question concerning normality
  - From: Maarten Buis <[email protected]>

Prev by Date: st: Re: question concerning normality
Next by Date: Re: st: constant variable as IV in panel?
Previous by thread: st: Re: question concerning normality
Next by thread: Re: st: Re: question concerning normality
Index(es):
- Date
- Thread