Re: st: indicator variable and interaction term different signs but both significant
Sun, 07 Apr 2013 15:58:40 -0500
At 07:34 AM 4/7/2013, David Hoaglin wrote:
That statement may be all right in Nahla's analysis. The difficulty
lies in the phrase "and the values of other variables are the same for
both." OC_MV = 0 because MV = 0; that is a special case. We don't
know that the data contain overconfident managers and rational
managers for whom the values of size, leverage, litigation, private_D,
and same_D are the same (or nearly enough the same). If so, no
problem. If not, the statement is an extrapolation, not supported by
the data. It is up to Nahla (and to analysts generally) to avoid
extrapolating (too far) beyond the data. Many people (and textbooks)
give that sort of interpretation without any evidence of checking on
Thanks David, but I admit I am still confused. According to the
model, it is the case that "The coefficient for OC_D is the predicted
difference between an overconfident manager and a regular manager
when MV = 0 and the values of other variables are the same for both."
If MV = 0 is an uninteresting or impossible value, that is pretty
much a worthless thing to know, but it is still a correct statement.
Part of what I like about my phrasing (which appears to be a more or
less common phrasing) is that I believe it helps make clear (perhaps
along with some graphs) why you generally shouldn't make a big deal
of the coefficient for the dummy variable, in this case OC_D. It is
simply the predicted difference between the two groups at a specific
point, MV = 0, a point that may not even be possible in practice.
Lines go off to infinity in both directions, and if the lines are
non-parallel (as when there are interactions) there will be an
infinite number of possible differences between the two lines, most
of which will be totally uninteresting. I used to have students
making statements like "once you control for female * income, the
effect of female switches from positive to negative" and they tried
to come up with profound theoretical explanations for that.
I agree with you about being careful about extrapolating beyond the
range of the data, but if MV = 0 isn't even theoretically possible it
is kind of a moot point. Testing the statistical significance of any
predicted values you compute should also give you some protection.
The main thing, though, is that I am confused by your preferred
wording: "The appropriate general interpretation of an estimated
coefficient is that
it tells how the dependent variable changes per unit change in that
predictor after adjusting for simultaneous linear change in the other
predictors in the data at hand." Why exactly is that a superior
wording? I'm not even totally sure what that means. Are you just
trying to warn against extrapolating beyond the observed range of the
data? If so I think there is probably a more straightforward way of
phrasing it. And, I don't think it is clear what "simultaneous linear
change in the other predictors" is supposed to mean. Nor do I think
the wording makes it clear what substantive interpretation you should
give to the coefficient for OC_D.
I think we are in agreement on most points, i.e. we both think there
is little point on making a big deal of when MV = 0 when that may not
be interesting or even possible -- but I don't understand why you
think your preferred wording is better and other wordings are
incorrect. But I'd be interested in hearing you elaborate.