Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
From | Nick Cox <njcoxstata@gmail.com> |
To | statalist@hsphsun2.harvard.edu |
Subject | Re: st: Re: question concerning normality |
Date | Wed, 10 Aug 2011 10:00:06 +0100 |
I agree with Maarten on his stance both on private emails and on the normality assumption. An easy counterexample is to suppose that x is uniformly distributed and y a perfect linear function of x. Then y is also uniformly distributed. So it would presumably fail any test for normality. But to argue that regression of y on x is invalid because the response is not normally distributed would be absurd. That said, it remains quite likely that working with logarithms of financial data is a good idea, but the information in the posting is insufficient to say much more. On Wed, Aug 10, 2011 at 9:43 AM, Maarten Buis <maartenlbuis@gmail.com> wrote: > On Fri, Aug 5, 2011 at 2:36 PM, Caspar Bijleveld wrote: >> I have seen a view of your responses on the internet concerning several >> Stata tips in statalist. I am currently working on a paper and I come across >> an important problem (which is probably quite easy to tackle) and hopefully >> you are able to help me. My dependent variable is not normally distributed >> (it concerns financial data). I have been advised to take the LN of the >> variable, which should make it more normally distributed, but still the >> Jarque Bera test is just significant ( I have to reject the nul hypothesis >> of normality). I think the non normality is due to a few outliers which can >> not be explained by any events in the past. > > These questions should not be sent privately but directly to the > Statalist. See <http://www.stata.com/support/faqs/res/statalist.html#private> > for several reasons why that is the case. > > It is a common misunderstanding that the dependent variable should > look like the bell shaped normal distribution. The distribution is > normally distributed, but one of the parameters of that distribution, > the conditional mean, changes from observation to observation. The > result is that the form of the distribution of the dependent variable > can be about anything, and tests for normality of the dependent > variable are completely meaningless as they assume a common set of > parameters for all observations. > > What you can look at is the distribution of the residuals, that should > have one mean (0) and one standard deviation (the root mean squared > error) for all observations. However, I would not rely on statistical > tests. Tests cannot directly test the hypothesis that a variable is > normally (or otherwise) distributed. They need to translate that to a > testable null hypothesis, which means they derive a limited number of > consequences from the assumed normality and test those. As a > consequence these tests can only detect some very specific deviations, > e.g. the Jarque-Bera test only looks that the skewness and the > kurtosis. The way to check for deviations from normality is to look at > graphs, for several useful graphs type in Stata -help pnorm-, -help > qnorm-, and -ssc desc hangroot-. > > As to outliers, it is helpful to see them as not a problem but as an > opportunity to strengthen your argument. Consider the description of > this classic analysis: > <http://www.significancemagazine.org/details/magazine/1076383/London-cholera-and-the-blindspot-of-an-epidemiology-theory-.html> > > The short version of it is: The controversy was: is cholera coming > from (drinking) water or from the air? The main pattern was that the > occurrence of the disease was highly clustered in certain areas. This > would be consistent with both competing theories. The fact that there > was a waterpump near the center of the high risk area helped support > the water theory, but was hardly conclusive. However, by carefully > examining the outliers John Snow could explain the main outliers, > people living in the high risk area that did not contract cholera. > They either had their own private well or they worked in a brewery and > drank beer. In this case it was the outliers that provide the most > convincing evidence, not the main pattern. > * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/