Statalist


[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

st: RE: RE: RE: Testing normality of a continuous predictor variable in a logistic model


From   "Soremekun, Seyi" <S.Soremekun@warwick.ac.uk>
To   <statalist@hsphsun2.harvard.edu>
Subject   st: RE: RE: RE: Testing normality of a continuous predictor variable in a logistic model
Date   Fri, 30 Nov 2007 15:54:13 -0000

Sorry Brendan, I wasn't clear at the start of my response it seems. No
your input variables don't need to be normally distributed in either
probit or logistic, the way the probabilities are worked out in probit
regression makes use of the cumulative normal distribution function
(i.e. the prob of 'yes' over 'no' is based on normally distributed
beta*x values). The linktest as I mentioned is useful in other models
that assume an underlying Gaussian distribution to your data. Hope this
helps!
S

-----Original Message-----
From: owner-statalist@hsphsun2.harvard.edu
[mailto:owner-statalist@hsphsun2.harvard.edu] On Behalf Of Nick Cox
Sent: 27 November 2007 17:41
To: statalist@hsphsun2.harvard.edu
Subject: st: RE: RE: Testing normality of a continuous predictor
variable in a logistic model

I can't see that using a probit would make any difference at all.  
Despite the appeal to latent normality the response is still
binary and in any case focus here is on the predictors. 

Nick 
n.j.cox@durham.ac.uk 

Soremekun, Seyi

Because it is a logistic and not probit regression you are attempting,
it actually does not matter if your variables are normal or not. The
main assumption that you might want to test is that the relationship
between the logit of your outcome and your predictor variables is linear
and that all the relevant predictors are included linktest is a basic
way of testing this- the predicted variable (_hat) should be significant
while its square (hatsq) should not be- if you have specified the right
link and variables. But if the box-tidwell test tells you the same thing
I wouldn't worry about the normality issue.
Cheers,

Brendan

I am working with a dataset containing 30000 observations. Some of the
explanatory variables are continuous. If I perform usual tests for
normality the numbers are too great for swilk or for sfrancia, and if I
use sktest the result is "absurdly" large values and rejects the
hypothesis of normal distribution. The frequency histogram, cumulative
frequency plot and normal plot all look normal with no outliers. I
presume that with such large numbers even very small deviations from
normal will lead to a significant result. The box- tidwell test
indicates that the model relationship is linear for all these continuous
variables. Is it safe to ignore the sktest results?

*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index