Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

From |
Maarten Buis <maartenlbuis@gmail.com> |

To |
statalist@hsphsun2.harvard.edu |

Subject |
st: Re: question concerning normality |

Date |
Wed, 10 Aug 2011 10:43:26 +0200 |

On Fri, Aug 5, 2011 at 2:36 PM, Caspar Bijleveld wrote: > I have seen a view of your responses on the internet concerning several > Stata tips in statalist. I am currently working on a paper and I come across > an important problem (which is probably quite easy to tackle) and hopefully > you are able to help me. My dependent variable is not normally distributed > (it concerns financial data). I have been advised to take the LN of the > variable, which should make it more normally distributed, but still the > Jarque Bera test is just significant ( I have to reject the nul hypothesis > of normality). I think the non normality is due to a few outliers which can > not be explained by any events in the past. These questions should not be sent privately but directly to the Statalist. See <http://www.stata.com/support/faqs/res/statalist.html#private> for several reasons why that is the case. It is a common misunderstanding that the dependent variable should look like the bell shaped normal distribution. The distribution is normally distributed, but one of the parameters of that distribution, the conditional mean, changes from observation to observation. The result is that the form of the distribution of the dependent variable can be about anything, and tests for normality of the dependent variable are completely meaningless as they assume a common set of parameters for all observations. What you can look at is the distribution of the residuals, that should have one mean (0) and one standard deviation (the root mean squared error) for all observations. However, I would not rely on statistical tests. Tests cannot directly test the hypothesis that a variable is normally (or otherwise) distributed. They need to translate that to a testable null hypothesis, which means they derive a limited number of consequences from the assumed normality and test those. As a consequence these tests can only detect some very specific deviations, e.g. the Jarque-Bera test only looks that the skewness and the kurtosis. The way to check for deviations from normality is to look at graphs, for several useful graphs type in Stata -help pnorm-, -help qnorm-, and -ssc desc hangroot-. As to outliers, it is helpful to see them as not a problem but as an opportunity to strengthen your argument. Consider the description of this classic analysis: <http://www.significancemagazine.org/details/magazine/1076383/London-cholera-and-the-blindspot-of-an-epidemiology-theory-.html> The short version of it is: The controversy was: is cholera coming from (drinking) water or from the air? The main pattern was that the occurrence of the disease was highly clustered in certain areas. This would be consistent with both competing theories. The fact that there was a waterpump near the center of the high risk area helped support the water theory, but was hardly conclusive. However, by carefully examining the outliers John Snow could explain the main outliers, people living in the high risk area that did not contract cholera. They either had their own private well or they worked in a brewery and drank beer. In this case it was the outliers that provide the most convincing evidence, not the main pattern. Hope this helps, Maarten -------------------------- Maarten L. Buis Institut fuer Soziologie Universitaet Tuebingen Wilhelmstrasse 36 72074 Tuebingen Germany http://www.maartenbuis.nl -------------------------- * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

**Follow-Ups**:**Re: st: Re: question concerning normality***From:*Nick Cox <njcoxstata@gmail.com>

- Prev by Date:
**Re: st: IV estimation in a negative binomial framework** - Next by Date:
**Re: st: Re: question concerning normality** - Previous by thread:
**st: Ultraedit and Stata 12** - Next by thread:
**Re: st: Re: question concerning normality** - Index(es):