Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

: st: Normally distributed error term & testing normality of residuals

From   "Seed, Paul" <>
To   "" <>
Subject   : st: Normally distributed error term & testing normality of residuals
Date   Fri, 19 Oct 2012 16:37:05 +0000

Dear Statalist & Maarten Buis, 

Very impressive graph drawing by Maarten.  I particularly like the CI round the qnorm plot.
I would agree that there are potential problems in 
working on the raw data rather than the residuals; 
as you demonstrate.  In  your example, two Normal distributions are combined 
into a bimodal outcome variable, which is revealed by the characteristic s-shape of 
the -qnorm- plot.  Fitting the main predictor would give Normally 
distributed residuals.

However, in practice, I would not be greatly bothered by this plot as evidence of deviation from Normality, 
particularly given the sample size.  And in fact it is not, as we can see from the derivation.  
(Although I might well check the residual plot).  

I would be much more bothered if there was a systematic curve, that could 
perhaps be straightened by taking logs or some other transformation. 
I demonstrate this with some extensions of Maarten's code below.
To show the effect of censoring, I have created y_cens, replacing all values of -y-
below  -1 by -1.

I would be very happy with the qnorm plot of -y_cens- as confirmation of approximate Normality, 
sufficient for regression analysis. 

The qnorm plots for -e_y- and -e_y_cens- in the example below are virtually identical, 
and both point strongly to the need for a transformation. 

***************** Example code ********************
* confirmation that there is no pr

* Initial code as in Maarten's example.

gen e_y = exp(y) 
qnorm e_y , name(e_y, replace)
* Clearly curved.  A transformation would likely help

gen y_cens = max(y, -1)
qnorm y_cens, name(y_cens, replace)
* Roughly straight (with flat section due to censoring).  A transformation is not needed

gen e_y_cens = exp(y_cens) 
qnorm e_y_cens , name(e_y_cens, replace)
* Clearly curved.  A transformation would likely help.

************* End example code ********************

I have rarely found predictors that are as strong and simple in their 
effect as in your example.  More usually, there are either a number of predictors 
each with relatively small effects, or one or two continuous measures, 
often with bell-shaped distributions. 

In my experience, under these circumstances, when you have found a transformation that makes the 
outcome close to Normal, the residuals tend to follow. Others may disagree.

This isn't an problem with a perfect solution; rather it is an area where 
statistics becomes an art as much as a science.

>> Date: Tue, 16 Oct 2012 09:52:24 +0200
>> From: Maarten Buis <>
>> Subject: Re: st: Normally distributed error term & testing normality of residuals

On Mon, Oct 15, 2012 at 6:56 PM, Seed, Paul wrote:
> I might add that I generally work on the raw data, not the residuals, as it is easier to
> understand the qnorm plot and the transformation needed; and I'm not interested in testing the
> residuals formally.

The problem with that is that Ebru is working in a regression like
context, and we would not expect the raw data to be
normally/Poisson/Gamma/... distributed when there are explanatory
variables involved. The marginal distribution of the
dependent/explained/left-hand-side/y-variable can deviate considerably
from the distribution that gives your regression model its name. This
is what I wrote the -margdistfit- package for. To borrow an example
from my talk at the 2012 German Stata Users' meeting

*   For searches and help try:

© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index