Stata: Data Analysis and Statistical Software

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: st: Interpretation of interaction with dummy in OLS

From	[email protected]
To	[email protected]
Subject	Re: st: Interpretation of interaction with dummy in OLS
Date	Fri, 27 Aug 2010 17:38:30 -0400
Susan, 

Probing the conditional effects of an interaction is straightforward with 
a dichotomous (dummy) moderator, and it is easy with multinomial 
moderators and continuous moderators (but these require the use of 
centering around various points on the continuous moderator). 

Before looking at your questions, consider the example below.  I have no 
idea what D is and how it was scored--hopefully 0 and 1, not 1 and 2.  A 
1, 2 scoring makes the conditional effect not meaningful.  So, for this 
example, I'll say that D is gender.  Also assume that we have two versions 
of D:

Dm: 0=male, 1 = female
Df: 0=female, 1 = male.


So you have:

M1: Y = a + b_IV + c_Dm + e

M2: Y = a + b_IV + c_Dm + d_IV*Dm + e

b in model 1 is the main effect (or what one might call the marginal or 
overall relation) IV to Y.  Another way to think about it is that it is 
the average of the conditional relations of IV to Y across the two values 
of D.  If you substitute Df for Dm in model 1, you get the same magnitude 
effect--the same overall effect for D--except that it would be opposite in 
sign. 

Now move to model 2.  Because the crossproduct is in the model, the 
coefficient b is no longer an overall relation, it is a conditional effect 
or relation. The coefficient b represents the relation of IV to Y when Dm 
is equal to zero.  Given that males are scored zero on Dm,  the 
coefficient b represents the relation of IV to Y for males. 

Now re-estimate model 2, supplementing Df in place of Dm:

M3: Y = a + b_IV + c_Df + d_IV*Df+ e

In model 3, the coefficient b represents the relation of IV to Y for 
females (because they are scores zero on Df), and you get the standard 
error for this conditional effect.

So by creating two versions of D, each with a 0 and 1 scoring, you can 
estimate and test the two conditional effects comprising the significant 
interaction involving D. Of course, you only need to estimate model 2 or 
model 3 to determine if the interaction is significant or not. 

Also, keep in mind that just estimating and testing conditional effects is 
often not enough.  One needs to plot the interaction to see its overall 
form.  It is easy to come up with examples, to use your variables, when 
one expects IV to be unrelated to Y when D is high and positively related 
to Y when D is low.  One can find these conditional relations, but when 
plotted, the form of the interaction is not as theory might predict.  It 
is not enough that one conditional effect is zero and the other is 
positive, where the two slopes fall relative to one another on the 
distribution of Y scores can be very important.


Now for your questions,

> I tested the model
> 
> Y = a + b_IV + c_D + d_IV*D + e
> 
> IV is log-transformed.
> D is a dummy.
> Y is not log-transformed. 
> 
> Comparing the model without the interaction (i.e. Y = a + b_IV + c_D + 
e) to 
> the model with interaction (Y = a + b_IV + c_D + d_IV*D + e) yields the 
> 
> following results for the coefficients:
> 
> b changes from 5 (model without interaction) to -6 (not significant in 
both 
> models)

Given what I said above, coefficient b in the model without the 
interaction is the overall effect of IV (i.e.,average of the conditional 
effects of IV across the two values of D).  Coefficient b in the model 
with the interaction is the condition effect of IV on Y when D is equal to 
zero. But if D was scored, 1, 2, then the value of this b, the conditional 
effect, is meaningless because a score of zero on the current version of D 
is meaningless.  Hopefully, given the example above, this should be clear. 


> c changes from 12 (significant) to -12 (not significant in model with 
> interaction)
> interaction coefficient for model with interaction is 42 and 
significant.

Given what is implied in what I said above, coefficient c in the model 
without the interaction is the overall effect of D on Y (ie., average of 
the conditional effects of D across all values of IV).  Coefficient c in 
the model with the interaction is the conditional effect of D when IV is 
equal to zero. But if zero is not a valid value for IV, then the 
coefficient c has no meaningful interpretation when the interaction is in 
the model.  To probe and test the conditional effects of D across values 
of IV would required the use of centering and rerunning the equation 
multiple times like the example above. 

 
> I have the following questions:
> (1) Does the significant and positive interaction term imply that 
> the effect of 
> the logged IV on Y is positive and significant when the dummy is 1? 

Read what I said above. 

 
> (2) If I want to test whether the logged IV moderates the 
> relationship between 
> the dummy and Y, is the following interpretation right?
>        - dummy c has positive and significant relation with Y
>     - logged IV positively moderates the relationship between dummy and 
Y 
> (interaction term positive and significant)?

It doesn't make a lot of sense to compare coefficients for b and c from 
the models with and without the interaction. And depending on the scoring 
of D and IV, it may be totally meaningless.  So test the interaction.  If 
it is significant, estimate and test the conditional effects in which you 
are interested.  If you want the conditional effects of IV for each value 
of D, I showed you what to do.  If you want the conditional effects of D 
at various values of IV, then you need to learn how to using centering. 
Finally, whichever way you want to look at the interaction, plot it. 


>(3) What are possible explanations for the dummy turning negative and 
> insignificant when the interaction with logged IV is entered? May it
> be that the > interaction  between dummy and logged IV, rather than only 
the dummy, has a 
> relationship with Y?

As you can see from what I've said, it is because the coefficient c 
changes from representing an overall relation for D without the 
interaction to representing the conditional effect of D when IV equals 
zero.


> (4) Is there any way to give an idea how large the moderation effectof 
dummy 
> and logged IV are on Y? (i.e. total effect = c+d  or  c+b+d?)

Well, you have the increment in R-sq attributable to the interaction.  But 
better yet, plot the interaction and you'll quite readily see if it has 
any practical value or even makes any sense.

If you'd like to do some basic reading on testing and interpreting 
interactions in linear models, I might suggest:

Aiken, L. S., & West, S. G. (1991). Multiple regression: Testing and 
interpreting interactions. Thousand Oaks, Sage.

Aquinis, H. (2004). Regression analysis for categorical moderators.  New 
York: Guilford Press.


Mike Frone

****************************************************************
Michael R. Frone, Ph.D.
Senior Research Scientist
Research Institute on Addictions
State University of New York at Buffalo
1021 Main Street
Buffalo, New York 14203

Office:    716-887-2519
Fax:        716-887-2477
E-mail:     [email protected]
Internet: http://www.ria.buffalo.edu/profiles/frone.html
***************************************************************


*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
References:
- st: Interpretation of interaction with dummy in OLS
  - From: P K <[email protected]>
Prev by Date: Re: st: Discrete time hazard model-interval censored
Next by Date: Re: st: RE: Heckman with variables that perfectly predict selection
Previous by thread: st: Interpretation of interaction with dummy in OLS
Index(es):
- Date
- Thread