A colleague asked me about some results with logistic regression. He had
two predictors of a binary outcome, call them A and B. When used alone,
predictor A was significantly related to the outcome and predictor B was
not. Moreover, the correlation between A and B was zero. When the
outcome was regressed on the two predictors simultaneously using logistic
regression both were significantly related to the outcome. In effect, the
coefficient for predictor B became larger. However, when OLS regression
was used instead, the coefficients for each predictor were the same as
when entered alone, which is what one would expect.
So I tried a little experiment. I selected a binary outcome and two
predictors that were moderately correlated, age and job tenure, r = .44.
I regressed the binary outcome on each variable separately and together
using OLS and logistic regression. I obtained the same pattern of
results across logistic and OLS regress. By themselves, both age and job
tenure were significant predictors of the outcome. But when entered
together, only age was significant.
Then I created versions of age and job tenure that were orthogonal using
-orthog-, basically taking out the variance in job tenure attributable to
age (the more important predictor of the outcome).
I again regressed the binary outcome on each orthogonal variable
separately and together. By themselves, as expected, age was significant
and job tenure was not in both OLS and logistic regression. But here is
the crux of the issue:
When I regress the binary outcome on the two orthogonal predictors using
OLS regression their regression coefficients, reported to 8 decimal
places, were identical to the coefficients I obtained when they were
entered separately.
In contrast, when I regressed the binary outcome on the two orthogonal
predictors using logistic regression, their regression coefficients were
not the same as obtained when treated separately. The coefficients for
the highly significant predictor, age, were nearly identical:
a) when entered by itself: b = -.7376565, p = 0.000
b) when entered with age: b = -.7450136, p = 0.000
However, this is what I obtained for job tenure:
c) when entered by itself: b = -.0704451, p = 0.227
d) when entered with age: b = -.1363843, p = 0.097
It's not clear to us why this happens. In both our cases, the variable
affected the most is not related to the other predictor and has either no
relation or weak nonsignificant relation to the outcome on its own. But a
nonsignificant variable can become statistically significant--in the
original case and almost so in this case. Yet there is no such issue with
linear regression. Is this just a trivial issue with a marginal
predictor or is there some more general issue?
Mike Frone
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/