[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

st: RE: Testing normality of a continuous predictor variable in a logistic model

From   "Nick Cox" <>
To   <>
Subject   st: RE: Testing normality of a continuous predictor variable in a logistic model
Date   Tue, 27 Nov 2007 12:03:52 -0000

Maarten has already made what I think is by far the most important
point, that marginal normality (Gaussianity) of predictors is not
an issue. 

I want to comment on a detail. Whether a histogram or a cumulative
frequency curve "looks normal" is in my view very difficult to judge
reliably. In the case of a histogram there are decisions over bin 
width and bin origin that are necessarily arbitrary. Even if 
a Gaussian density or distribution function is superimposed, 
as the case may be, comparison is still problematic. More
positively, a normal plot [quantile-quantile plot, presumably]
is customised for this problem and far more useful. 

An alternative test for normality is given by -omninorm- on SSC. 
I don't use myself much, but it was fun to program. 


I am working with a dataset containing 30000 observations. Some of the
explanatory variables are continuous. If I perform usual tests for
normality the numbers are too great for swilk or for sfrancia, and if I
use sktest the result is "absurdly" large values and rejects the
hypothesis of normal distribution. The frequency histogram, cumulative
frequency plot and normal plot all look normal with no outliers. I
presume that with such large numbers even very small deviations from
normal will lead to a significant result. The box- tidwell test
indicates that the model relationship is linear for all these continuous
variables. Is it safe to ignore the sktest results?

*   For searches and help try:

© Copyright 1996–2017 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index