# st: RE: Explaining interaction terms

 From "Maarten Buis" To Subject st: RE: Explaining interaction terms Date Wed, 5 Oct 2005 11:15:45 +0200

```Dear Leny,

You already hinted at the fact that when the y is log transformed the regression coefficients can be interpreted as the percentage change in y as a result of a unit change in x. Interaction terms work the same ways as they do with an untransformed y, so your interaction model looks like this:
log(y) = b0 + b1X1 + b2X2 + b3X1X2

You can rewrite it as ``seperate'' regressions for different groups like this:
group 1 (X1=0):
log(y) = b0 + b1*0 + b2X2 +b3*0*X2 = b0 + b2X2

group 2 (X1=1):
log(y) = b0 +b1*1 + b2X2 + b3*1*X2 = (b0 + b1) + (b2 + b3)X2

So b2 can be interpreted as the percentage change in y as a result of a unit change in time (X2) for persons belonging to group 1. b2 + b3 can be interpreted as the percentage change in y as a result of a unit change in time for persons belonging to group 2. You can get standard errors and confidence intervals with -lincom X2 + X1X2-

An alternative (and equivalent) interpretation of the coefficient of the interaction term is the difference in effect of time between group 1 and group 2. This interpretation is very useful if you want to test whether time has a different effect for group one members than for group two members.

You were also interested in finding the point in time where y started to increase. I assumed you realised that y always increases in the models specified above, and that that was the reason you tried the spline regressions. Problem with spline regression in this context is that you do not estimate the point at which y starts to increase. The analyst chooses the location of the knots, and it is this location you actually want to estimate. You can estimate the location (for a fixed number of knots chosen by the analyst) using the -nl- command. I've put an example code showing how to do that underneath this message. However, this approach can be unstable and dependent upon good starting values. Furthermore, you still have to choose the number of knots.

I've just finished a first version of paper where I used an alternative and more flexible approach. Basically the inflection point is the point where the second derivative is zero. I've estimated a loess curve, and calculated the first and second derivative of this curve. Confidence intervals can be created around these curves, so the point where the second derivative is zero can be located. However, I have not found a command that does this in Stata, so I used R (which does have ready to use programs (note plural) doing this). If you are interested, I can send you a version of the paper and the code used to estimate the models.

Hope this helps,
Maarten

Ps. I find it a lot easier to keep track of what I am doing during data preparation and analysis when I give my variables intelligible names like ``time'' or ``female'' (1=true so female, 0=false so male) instead of X1, X2.

*--------begin example code----------
sysuse auto, clear

scatter price mpg, jitter(4)
/*There seems to be a knot around mpg=20*/
/*getting starting values*/
gen mpg2 = max(mpg-20,0)
reg price mpg mpg2
global b0 = _b[_cons]
global b1 = _b[mpg]
global b2 = _b[mpg2]

/*the model we are trying to estimate is:*/
/*price = b0 + b1 mpg + b2(max(mpg-k1,0)*/
/*k1 is the inflection point you are looking for*/

/*actual estimation*/
capture program drop nlspline
program define nlspline
version 8.2
if "`1'" == "?" {
global S_1 "b0 b1 b2 k1"
global b0 = \$b0
global b1 = \$b1
global b2 = \$b2
global k1 = 20
exit
}
replace `1' = \$b0 + \$b1*mpg + \$b2*(max(mpg-\$k1, 0))
end
nl spline price
*---------------end example code---------------------

-----Original Message-----
From: owner-statalist@hsphsun2.harvard.edu [mailto:owner-statalist@hsphsun2.harvard.edu]On Behalf Of lm335@drexel.edu
Sent: dinsdag 4 oktober 2005 22:15
To: statalist@hsphsun2.harvard.edu
Subject: st: Explaining interaction terms

Hello,
I have 2 questions regarding multiplicative regression models. I hope that the statisticians in the group can help me out.

1) I have a regression model with the dependent variable log-transformed and the first independent variable X1 as taking 0 if belonging to group1 and 1 if belonging to group2, and the second variable X2 denoting time. In this model both the independent variables are significant and it was seen that for the two cases of X1, the curves have different slopes.

I considered another model with an interaction term between X1 and X2. In the output, the group effect is not significant, but the interaction effect is.

Regression with robust standard errors           Number of obs=     125
F(  3,    42)      =   86.36
Prob > F     =  0.0000
R-squared  =  0.7339

Number of clusters (ID) = 43                                 Root MSE   =  .50028

Robust
lnY   Coef.  Std. Err.   t  P>t  [95% Conf. Inter]
X2   .0826   .0098    8.43  0.0   .062  .102
X1  -.2171   .1892   -1.15  0.26 -.599  .164
X1X2 .0264   .0125    2.10  0.04  .001  .051
cons 1.91    .1536   12.42  0.0   1.598  2.218

In the above output how should one interpret the effect of the interaction term in the model?  For group1, can we say that for a one unit change in X2, the value of y changes by a percentage? Or is the only way to express the increase in Y by specific value of X2?

2) In the same model as above is there a way to find the point where the slope starts to increase: (inflection point?) (I tried spline regression, but can't get a good fitting model).

Thank You

Leny Mathew

*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
```