Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

# Re: st: Interaction model

 From David Hoaglin To statalist@hsphsun2.harvard.edu Subject Re: st: Interaction model Date Wed, 8 Feb 2012 21:55:37 -0500

```You're welcome, Shikha.

It will be helpful to reproduce model (a), correcting the typo:

(a) income= b1*program + b2*rich + b3*immi + b4*male + b5*program*rich
+b6*program*male + b7*program*immi

If I were, mechanically, to sketch an interpretation of b1, it would
say that b1 gives the effect of the program on income, adjusting for
the contributions of [the other six predictors].  Unfortunately, if
the interaction effects are significant, it is not meaningful to
interpret a main effect in the presence of interactions between that
variable and other variables.  And in model (a) each of the four
variables in involved in at least one two-factor interaction.  Thus,
the model would be saying that the effect of the program differed
between rich and poor, between immigrants and non-immigrants, and
between males and females; and you would need to start with the
average income in each of those subgroups and discuss the comparisons.
A weighted average over the groups might be useful.

You have not explained why model (a) does not contain a constant term,
which we could denote by b0.

In such an analysis, if you have enough data, it would make sense to
start with the "saturated" model, which would contain b0 and also the
terms rich*male, rich*immi, and immi*male, program*rich*immi,
program*rich*male, program*immi*male, rich*immi*male, and
program*rich*immi*male (for a total of 16 predictors).  It might then
be possible to eliminate some of the interactions, starting with the
highest-order and working down.  (If a given interaction is
significant, however, the model must retain all the lower-order terms
associated with the variables involved in that interaction.)

The easiest model to interpret is the additive model, which would
contain b0 and only the main effects for the four variables.
Departures from additivity often arise when the response variable is
not yet expressed in a suitable scale.  In your analysis, data on
income are often skewed, and they behave better when transformed to a
logarithmic scale.  I wonder whether analyzing income in the log scale
would lead to an analysis in which the contributions are more nearly
additive.  Then, transforming back to the original scale would produce
effects that are multiplicative.

David Hoaglin

> b4 is not the coefficient for both male and program*rich- it was a mistake/typo.
>
> I understand the model in (a) is a richer model compared to different
> specifications in (b). What would be the interpretation of b1 in (a)?
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
```