Statalist The Stata Listserver


[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

Re: st: logistic regression with orthogonal predictors


From   Richard Williams <Richard.A.Williams.5@ND.edu>
To   statalist@hsphsun2.harvard.edu, <statalist@hsphsun2.harvard.edu>
Subject   Re: st: logistic regression with orthogonal predictors
Date   Tue, 16 May 2006 08:23:50 -0500

At 02:35 PM 5/15/2006, frone@ria.buffalo.edu wrote:
A colleague asked me about some results with logistic regression.  He had
two predictors of a binary outcome, call them A and B.  When used alone,
predictor A was significantly related to the outcome and predictor B was
not.  Moreover, the correlation between A and B was zero.  When the
outcome was regressed on the two predictors simultaneously using logistic
regression both were significantly related to the outcome.  In effect, the
coefficient for predictor B became larger.  However, when OLS regression
was used instead, the coefficients for each predictor were the same as
when entered alone, which is what one would expect.
To elaborate a bit on my last answer - in OLS, the variance of y is the variance of y, i.e. it doesn't matter whether y is regressed on X1, or X1 and X2, or X1 and X2 and X3 - the variance of y will be the same in every case.

BUT, in logistic regression (also probit and others) the variance of the underlying latent variable y* changes as you go from one model to the next, i.e. the variance of y* will be different when y is regressed on X1 than when it is regressed on X1 and X2. This is because, in a logistic regression, the latent variable is normalized by fixing its residual variance at about 3.29 (in probit it is fixed at 1). Since the residual variance is fixed, as more vars are added, the explained variance increases, and the total variance of y* increased. In short, with logit and probit, your dv is a moving target, i.e. its variance changes from one model to the next. Hence, even when the Xs are uncorrelated, you see behavior such as was described in the original message.

The handouts I cited earlier also show that, if you use RWLS (Rich Williams's Least Squares - a little known method and deservedly so) you can get the same sort of behavior in OLS, i.e. if you fix the residual variance at a specific value (e.g. 3.29) then the coefficient estimates behave in the same odd ways.

In short, you have to realize that a lot of the things we are used to in OLS do not work the same way in logit and probit. In OLS, our DV is an observed variable; in logit and probit, our DV is actually a latent unobserved variable (all we see is the 0-1 dichotomy that is caused by the undelrying latent variable.)


-------------------------------------------
Richard Williams, Notre Dame Dept of Sociology
OFFICE: (574)631-6668, (574)631-6463
FAX: (574)288-4373
HOME: (574)289-5227
EMAIL: Richard.A.Williams.5@ND.Edu
WWW (personal): http://www.nd.edu/~rwilliam
WWW (department): http://www.nd.edu/~soc
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/




© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index