Richard,
Apologies for not responding sooner. I was out of the office. Thanks for
sharing your handouts. They were very clear. I especially liked the RWLS
example.
I have never been a fan of standardized coefficients in OLS for a number
of reasons and typically argue against using them, at least for the
typical reasons used to justify their use. However, I see the advantage
you bring up for nested models in logistic regression, and I assume other
GLMs. However, in addition to the issue of coefficients increasing in
size as one adds predictors, one runs into the situation, that cannot be
attributed to suppression, where predictors that were not statistically
significant become statistically significant as the variance in Y*
increases.
Thanks again.
Mike Frone
Richard Williams <Richard.A.Williams.5@ND.edu>
Sent by: owner-statalist@hsphsun2.harvard.edu
05/16/2006 09:23 AM
Please respond to
statalist@hsphsun2.harvard.edu
To
statalist@hsphsun2.harvard.edu, <statalist@hsphsun2.harvard.edu>
cc
Subject
Re: st: logistic regression with orthogonal predictors
At 02:35 PM 5/15/2006, frone@ria.buffalo.edu wrote:
>A colleague asked me about some results with logistic regression. He had
>two predictors of a binary outcome, call them A and B. When used alone,
>predictor A was significantly related to the outcome and predictor B was
>not. Moreover, the correlation between A and B was zero. When the
>outcome was regressed on the two predictors simultaneously using logistic
>regression both were significantly related to the outcome. In effect,
the
>coefficient for predictor B became larger. However, when OLS regression
>was used instead, the coefficients for each predictor were the same as
>when entered alone, which is what one would expect.
To elaborate a bit on my last answer - in OLS, the variance of y is
the variance of y, i.e. it doesn't matter whether y is regressed on
X1, or X1 and X2, or X1 and X2 and X3 - the variance of y will be the
same in every case.
BUT, in logistic regression (also probit and others) the variance of
the underlying latent variable y* changes as you go from one model to
the next, i.e. the variance of y* will be different when y is
regressed on X1 than when it is regressed on X1 and X2. This is
because, in a logistic regression, the latent variable is normalized
by fixing its residual variance at about 3.29 (in probit it is fixed
at 1). Since the residual variance is fixed, as more vars are added,
the explained variance increases, and the total variance of y*
increased. In short, with logit and probit, your dv is a moving
target, i.e. its variance changes from one model to the next. Hence,
even when the Xs are uncorrelated, you see behavior such as was
described in the original message.
The handouts I cited earlier also show that, if you use RWLS (Rich
Williams's Least Squares - a little known method and deservedly so)
you can get the same sort of behavior in OLS, i.e. if you fix the
residual variance at a specific value (e.g. 3.29) then the
coefficient estimates behave in the same odd ways.
In short, you have to realize that a lot of the things we are used to
in OLS do not work the same way in logit and probit. In OLS, our DV
is an observed variable; in logit and probit, our DV is actually a
latent unobserved variable (all we see is the 0-1 dichotomy that is
caused by the undelrying latent variable.)
-------------------------------------------
Richard Williams, Notre Dame Dept of Sociology
OFFICE: (574)631-6668, (574)631-6463
FAX: (574)288-4373
HOME: (574)289-5227
EMAIL: Richard.A.Williams.5@ND.Edu
WWW (personal): http://www.nd.edu/~rwilliam
WWW (department): http://www.nd.edu/~soc
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/