Notice: On March 31, it was **announced** that Statalist is moving from an email list to a **forum**. The old list will shut down on April 23, and its replacement, **statalist.org** is already up and running.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

From |
Austin Nichols <austinnichols@gmail.com> |

To |
statalist@hsphsun2.harvard.edu |

Subject |
Re: st: Re: question concerning normality |

Date |
Wed, 10 Aug 2011 11:11:19 -0400 |

Nick, Maarten, and Caspar Bijleveld-- Even better than transforming your depvar is to use -glm- with an appropriate link. Log may have its appeal, but various powers can often be preferable for extreme skewness. Consider the cube root, or fifth or seventh roots, for asset values, for example. On Wed, Aug 10, 2011 at 5:00 AM, Nick Cox <njcoxstata@gmail.com> wrote: > I agree with Maarten on his stance both on private emails and on the > normality assumption. > > An easy counterexample is to suppose that x is uniformly distributed > and y a perfect linear function of x. Then y is also uniformly > distributed. So it would presumably fail any test for normality. But > to argue that regression of y on x is invalid because the response is > not normally distributed would be absurd. > > That said, it remains quite likely that working with logarithms of > financial data is a good idea, but the information in the posting is > insufficient to say much more. > > On Wed, Aug 10, 2011 at 9:43 AM, Maarten Buis <maartenlbuis@gmail.com> wrote: >> On Fri, Aug 5, 2011 at 2:36 PM, Caspar Bijleveld wrote: >>> I have seen a view of your responses on the internet concerning several >>> Stata tips in statalist. I am currently working on a paper and I come across >>> an important problem (which is probably quite easy to tackle) and hopefully >>> you are able to help me. My dependent variable is not normally distributed >>> (it concerns financial data). I have been advised to take the LN of the >>> variable, which should make it more normally distributed, but still the >>> Jarque Bera test is just significant ( I have to reject the nul hypothesis >>> of normality). I think the non normality is due to a few outliers which can >>> not be explained by any events in the past. >> >> These questions should not be sent privately but directly to the >> Statalist. See <http://www.stata.com/support/faqs/res/statalist.html#private> >> for several reasons why that is the case. >> >> It is a common misunderstanding that the dependent variable should >> look like the bell shaped normal distribution. The distribution is >> normally distributed, but one of the parameters of that distribution, >> the conditional mean, changes from observation to observation. The >> result is that the form of the distribution of the dependent variable >> can be about anything, and tests for normality of the dependent >> variable are completely meaningless as they assume a common set of >> parameters for all observations. >> >> What you can look at is the distribution of the residuals, that should >> have one mean (0) and one standard deviation (the root mean squared >> error) for all observations. However, I would not rely on statistical >> tests. Tests cannot directly test the hypothesis that a variable is >> normally (or otherwise) distributed. They need to translate that to a >> testable null hypothesis, which means they derive a limited number of >> consequences from the assumed normality and test those. As a >> consequence these tests can only detect some very specific deviations, >> e.g. the Jarque-Bera test only looks that the skewness and the >> kurtosis. The way to check for deviations from normality is to look at >> graphs, for several useful graphs type in Stata -help pnorm-, -help >> qnorm-, and -ssc desc hangroot-. >> >> As to outliers, it is helpful to see them as not a problem but as an >> opportunity to strengthen your argument. Consider the description of >> this classic analysis: >> <http://www.significancemagazine.org/details/magazine/1076383/London-cholera-and-the-blindspot-of-an-epidemiology-theory-.html> >> >> The short version of it is: The controversy was: is cholera coming >> from (drinking) water or from the air? The main pattern was that the >> occurrence of the disease was highly clustered in certain areas. This >> would be consistent with both competing theories. The fact that there >> was a waterpump near the center of the high risk area helped support >> the water theory, but was hardly conclusive. However, by carefully >> examining the outliers John Snow could explain the main outliers, >> people living in the high risk area that did not contract cholera. >> They either had their own private well or they worked in a brewery and >> drank beer. In this case it was the outliers that provide the most >> convincing evidence, not the main pattern. * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

**Follow-Ups**:**Re: st: Re: question concerning normality***From:*Nick Cox <njcoxstata@gmail.com>

**References**:**st: Re: question concerning normality***From:*Maarten Buis <maartenlbuis@gmail.com>

**Re: st: Re: question concerning normality***From:*Nick Cox <njcoxstata@gmail.com>

- Prev by Date:
**st: Group-mean equality test** - Next by Date:
**Re: st: Generate combined with if** - Previous by thread:
**Re: st: Re: question concerning normality** - Next by thread:
**Re: st: Re: question concerning normality** - Index(es):