[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

From |
"Soremekun, Seyi" <S.Soremekun@warwick.ac.uk> |

To |
<statalist@hsphsun2.harvard.edu> |

Subject |
st: RE: RE: RE: Testing normality of a continuous predictor variable in a logistic model |

Date |
Fri, 30 Nov 2007 15:54:13 -0000 |

Sorry Brendan, I wasn't clear at the start of my response it seems. No your input variables don't need to be normally distributed in either probit or logistic, the way the probabilities are worked out in probit regression makes use of the cumulative normal distribution function (i.e. the prob of 'yes' over 'no' is based on normally distributed beta*x values). The linktest as I mentioned is useful in other models that assume an underlying Gaussian distribution to your data. Hope this helps! S -----Original Message----- From: owner-statalist@hsphsun2.harvard.edu [mailto:owner-statalist@hsphsun2.harvard.edu] On Behalf Of Nick Cox Sent: 27 November 2007 17:41 To: statalist@hsphsun2.harvard.edu Subject: st: RE: RE: Testing normality of a continuous predictor variable in a logistic model I can't see that using a probit would make any difference at all. Despite the appeal to latent normality the response is still binary and in any case focus here is on the predictors. Nick n.j.cox@durham.ac.uk Soremekun, Seyi Because it is a logistic and not probit regression you are attempting, it actually does not matter if your variables are normal or not. The main assumption that you might want to test is that the relationship between the logit of your outcome and your predictor variables is linear and that all the relevant predictors are included linktest is a basic way of testing this- the predicted variable (_hat) should be significant while its square (hatsq) should not be- if you have specified the right link and variables. But if the box-tidwell test tells you the same thing I wouldn't worry about the normality issue. Cheers, Brendan I am working with a dataset containing 30000 observations. Some of the explanatory variables are continuous. If I perform usual tests for normality the numbers are too great for swilk or for sfrancia, and if I use sktest the result is "absurdly" large values and rejects the hypothesis of normal distribution. The frequency histogram, cumulative frequency plot and normal plot all look normal with no outliers. I presume that with such large numbers even very small deviations from normal will lead to a significant result. The box- tidwell test indicates that the model relationship is linear for all these continuous variables. Is it safe to ignore the sktest results? * * For searches and help try: * http://www.stata.com/support/faqs/res/findit.html * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/ * * For searches and help try: * http://www.stata.com/support/faqs/res/findit.html * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

**References**:**st: Testing normality of a continuous predictor variable in a logistic model***From:*Brendan <hsct@icon.co.za>

**st: RE: Testing normality of a continuous predictor variable in a logistic model***From:*"Soremekun, Seyi" <S.Soremekun@warwick.ac.uk>

**st: RE: RE: Testing normality of a continuous predictor variable in a logistic model***From:*"Nick Cox" <n.j.cox@durham.ac.uk>

- Prev by Date:
**st: -corrtable- available from SSC** - Next by Date:
**Re: st: -corrtable- available from SSC** - Previous by thread:
**st: RE: RE: Testing normality of a continuous predictor variable in a logistic model** - Next by thread:
**st: interval ordinal** - Index(es):

© Copyright 1996–2014 StataCorp LP | Terms of use | Privacy | Contact us | What's new | Site index |