 Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

# : st: Normally distributed error term & testing normality of residuals

 From "Seed, Paul" To "statalist@hsphsun2.harvard.edu" Subject : st: Normally distributed error term & testing normality of residuals Date Fri, 19 Oct 2012 16:37:05 +0000

```Dear Statalist & Maarten Buis,

Very impressive graph drawing by Maarten.  I particularly like the CI round the qnorm plot.
I would agree that there are potential problems in
working on the raw data rather than the residuals;
as you demonstrate.  In  your example, two Normal distributions are combined
into a bimodal outcome variable, which is revealed by the characteristic s-shape of
the -qnorm- plot.  Fitting the main predictor would give Normally
distributed residuals.

However, in practice, I would not be greatly bothered by this plot as evidence of deviation from Normality,
particularly given the sample size.  And in fact it is not, as we can see from the derivation.
(Although I might well check the residual plot).

I would be much more bothered if there was a systematic curve, that could
perhaps be straightened by taking logs or some other transformation.
I demonstrate this with some extensions of Maarten's code below.
To show the effect of censoring, I have created y_cens, replacing all values of -y-
below  -1 by -1.

I would be very happy with the qnorm plot of -y_cens- as confirmation of approximate Normality,
sufficient for regression analysis.

The qnorm plots for -e_y- and -e_y_cens- in the example below are virtually identical,
and both point strongly to the need for a transformation.

***************** Example code ********************
* confirmation that there is no pr

* Initial code as in Maarten's example.

gen e_y = exp(y)
qnorm e_y , name(e_y, replace)
* Clearly curved.  A transformation would likely help

gen y_cens = max(y, -1)
qnorm y_cens, name(y_cens, replace)
* Roughly straight (with flat section due to censoring).  A transformation is not needed

gen e_y_cens = exp(y_cens)
qnorm e_y_cens , name(e_y_cens, replace)
* Clearly curved.  A transformation would likely help.

************* End example code ********************

I have rarely found predictors that are as strong and simple in their
effect as in your example.  More usually, there are either a number of predictors
each with relatively small effects, or one or two continuous measures,
often with bell-shaped distributions.

In my experience, under these circumstances, when you have found a transformation that makes the
outcome close to Normal, the residuals tend to follow. Others may disagree.

This isn't an problem with a perfect solution; rather it is an area where
statistics becomes an art as much as a science.

>> Date: Tue, 16 Oct 2012 09:52:24 +0200
>> From: Maarten Buis <maartenlbuis@gmail.com>
>> Subject: Re: st: Normally distributed error term & testing normality of residuals

On Mon, Oct 15, 2012 at 6:56 PM, Seed, Paul wrote:
> I might add that I generally work on the raw data, not the residuals, as it is easier to
> understand the qnorm plot and the transformation needed; and I'm not interested in testing the
> residuals formally.

The problem with that is that Ebru is working in a regression like
context, and we would not expect the raw data to be
normally/Poisson/Gamma/... distributed when there are explanatory
variables involved. The marginal distribution of the
dependent/explained/left-hand-side/y-variable can deviate considerably
from the distribution that gives your regression model its name. This
is what I wrote the -margdistfit- package for. To borrow an example
from my talk at the 2012 German Stata Users' meeting
(<http://www.maartenbuis.nl/presentations/berlin12.html>):

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/
```