Notice: On March 31, it was **announced** that Statalist is moving from an email list to a **forum**. The old list will shut down on April 23, and its replacement, **statalist.org** is already up and running.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

From |
"Seed, Paul" <paul.seed@kcl.ac.uk> |

To |
"statalist@hsphsun2.harvard.edu" <statalist@hsphsun2.harvard.edu> |

Subject |
: st: Normally distributed error term & testing normality of residuals |

Date |
Fri, 19 Oct 2012 16:37:05 +0000 |

Dear Statalist & Maarten Buis, Very impressive graph drawing by Maarten. I particularly like the CI round the qnorm plot. I would agree that there are potential problems in working on the raw data rather than the residuals; as you demonstrate. In your example, two Normal distributions are combined into a bimodal outcome variable, which is revealed by the characteristic s-shape of the -qnorm- plot. Fitting the main predictor would give Normally distributed residuals. However, in practice, I would not be greatly bothered by this plot as evidence of deviation from Normality, particularly given the sample size. And in fact it is not, as we can see from the derivation. (Although I might well check the residual plot). I would be much more bothered if there was a systematic curve, that could perhaps be straightened by taking logs or some other transformation. I demonstrate this with some extensions of Maarten's code below. To show the effect of censoring, I have created y_cens, replacing all values of -y- below -1 by -1. I would be very happy with the qnorm plot of -y_cens- as confirmation of approximate Normality, sufficient for regression analysis. The qnorm plots for -e_y- and -e_y_cens- in the example below are virtually identical, and both point strongly to the need for a transformation. ***************** Example code ******************** * confirmation that there is no pr * Initial code as in Maarten's example. gen e_y = exp(y) qnorm e_y , name(e_y, replace) * Clearly curved. A transformation would likely help gen y_cens = max(y, -1) qnorm y_cens, name(y_cens, replace) * Roughly straight (with flat section due to censoring). A transformation is not needed gen e_y_cens = exp(y_cens) qnorm e_y_cens , name(e_y_cens, replace) * Clearly curved. A transformation would likely help. ************* End example code ******************** I have rarely found predictors that are as strong and simple in their effect as in your example. More usually, there are either a number of predictors each with relatively small effects, or one or two continuous measures, often with bell-shaped distributions. In my experience, under these circumstances, when you have found a transformation that makes the outcome close to Normal, the residuals tend to follow. Others may disagree. This isn't an problem with a perfect solution; rather it is an area where statistics becomes an art as much as a science. >> Date: Tue, 16 Oct 2012 09:52:24 +0200 >> From: Maarten Buis <maartenlbuis@gmail.com> >> Subject: Re: st: Normally distributed error term & testing normality of residuals On Mon, Oct 15, 2012 at 6:56 PM, Seed, Paul wrote: > I might add that I generally work on the raw data, not the residuals, as it is easier to > understand the qnorm plot and the transformation needed; and I'm not interested in testing the > residuals formally. The problem with that is that Ebru is working in a regression like context, and we would not expect the raw data to be normally/Poisson/Gamma/... distributed when there are explanatory variables involved. The marginal distribution of the dependent/explained/left-hand-side/y-variable can deviate considerably from the distribution that gives your regression model its name. This is what I wrote the -margdistfit- package for. To borrow an example from my talk at the 2012 German Stata Users' meeting (<http://www.maartenbuis.nl/presentations/berlin12.html>): * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/faqs/resources/statalist-faq/ * http://www.ats.ucla.edu/stat/stata/

- Prev by Date:
**Re: st: bcal** - Next by Date:
**Re: st: most requested *simple* features to help SPSS users transition** - Previous by thread:
**st: bcal** - Next by thread:
**st: How to set calibrated weights** - Index(es):