Bookmark and Share

Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down on April 23, and its replacement, is already up and running.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: indicator variable and interaction term different signs but both significant

From   David Hoaglin <>
Subject   Re: st: indicator variable and interaction term different signs but both significant
Date   Sun, 7 Apr 2013 21:51:08 -0400


Thanks for the thoughtful discussion.  I'm glad to elaborate.

The short answer, oversimplifying somewhat (but not a lot), is that
the "common phrasing" is incorrect, because it does not reflect the
way multiple regression works.  For reference, not tied to the present
example, one version of the common interpretation (which appears in
far too many books) is that a coefficient in a multiple regression
tells us about the change in y corresponding to an increase of 1 unit
in that predictor when the other predictors are held constant.  In
less categorical language, I usually say that, as a general
interpretation, it is oversimplified and often incorrect.  Thus, my
"preferred interpretation" is superior simply because it accurately
reflects the way multiple regression works (more below).

When you say, "According to the model ...," the phrase "when ... the
values of other variables are the same for both" is not actually
"according to the model."  The distinction may be clearer if you
consider the partial-regression plot (also called the "added-variable
plot") for a chosen predictor.  The vertical coordinate is the
residual from the regression of y on the other predictors, and the
horizontal coordinate is the residual from the regression of the
chosen predictor on the other predictors.  The slope of the regression
line through the origin of the partial-regression plot equals the
coefficient of the chosen predictor in the multiple regression (in
which the predictors are the chosen predictor and the other
predictors).  This result is straightforward mathematics, and it
motivates the interpretation that the coefficient of the chosen
predictor tells how the dependent variable changes per unit change in
that predictor after adjusting for simultaneous linear change in the
other predictors in the data at hand.  The adjustment consists of
freeing y (and the chosen predictor) of regression on the other
predictors.  The process of fitting a multiple regression model does
not hold those other predictors constant.  Cook and Weisberg (1982,
Section 2.3.2) give a proof.  I haven't tried to locate the earliest
proof, but Yule (1907, Section 9) has an elegant proof.  Mosteller and
Tukey (1977) have a chapter entitled "Woes of Regression Coefficients"
and a proof (in Section 14K).  The development of regression in the
introductory textbook by De Veaux et al. (2012) includes the correct
general interpretation.

My point about not extrapolating beyond the data is not moot, because
I was focusing mainly on size, leverage, litigation, private_D, and

Multiple regression is often more complex than it appears.  To gain a
proper understanding, however, one has to come to grips with the
complexity.  The "held constant" interpretation of regression
coefficients introduces avoidable confusion and impedes proper

I hope this discussion helps.

David Hoaglin

Cook RD, Weisberg S (1982).  Residuals and Influence in Regression.
Chapman and Hall.

De Veaux RD, Velleman PF, Bock DE (2012).  Stats: Data and Models, 3rd
ed.  Addison-Wesley.

Mosteller F, Tukey JW (1977).  Data Analysis and Regression.  Addison-Wesley.

Yule, GU (1907).  On the theory of correlation for any number of
variables, treated by a new system of notation.  Proceedings of the
Royal Society of London. Series A, Containing Papers of a Mathematical
and Physical Character.  79:182-193.

On Sun, Apr 7, 2013 at 4:58 PM, Richard Williams
<> wrote:

> Thanks David, but I admit I am still confused. According to the model, it is
> the case that "The coefficient for OC_D is the predicted difference between
> an overconfident manager and a regular manager when MV = 0 and the values of
> other variables are the same for both." If MV = 0 is an uninteresting or
> impossible value, that is pretty much a worthless thing to know, but it is
> still a correct statement.
> Part of what I like about my phrasing (which appears to be a more or less
> common phrasing) is that I believe it helps make clear (perhaps along with
> some graphs) why you generally shouldn't make a big deal of the coefficient
> for the dummy variable, in this case OC_D. It is simply the predicted
> difference between the two groups at a specific point, MV = 0, a point that
> may not even be possible in practice. Lines go off to infinity in both
> directions, and if the lines are non-parallel (as when there are
> interactions) there will be an infinite number of possible differences
> between the two lines, most of which will be totally uninteresting. I used
> to have students making statements like "once you control for female *
> income, the effect of female switches from positive to negative" and they
> tried to come up with profound theoretical explanations for that.
> I agree with you about being careful about extrapolating beyond the range of
> the data, but if MV = 0 isn't even theoretically possible it is kind of a
> moot point. Testing the statistical significance of any predicted values you
> compute should also give you some protection.
> The main thing, though, is that I am confused by your preferred wording:
> "The appropriate general interpretation of an estimated coefficient is that
> it tells how the dependent variable changes per unit change in that
> predictor after adjusting for simultaneous linear change in the other
> predictors in the data at hand." Why exactly is that a superior wording? I'm
> not even totally sure what that means. Are you just trying to warn against
> extrapolating beyond the observed range of the data? If so I think there is
> probably a more straightforward way of phrasing it. And, I don't think it is
> clear what "simultaneous linear change in the other predictors" is supposed
> to mean. Nor do I think the wording makes it clear what substantive
> interpretation you should give to the coefficient for OC_D.
> I think we are in agreement on most points, i.e. we both think there is
> little point on making a big deal of when MV = 0 when that may not be
> interesting or even possible -- but I don't understand why you think your
> preferred wording is better and other wordings are incorrect. But I'd be
> interested in hearing you elaborate.
*   For searches and help try:

© Copyright 1996–2016 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   Site index