Bookmark and Share

Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down at the end of May, and its replacement, is already up and running.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: Interaction terms interpretation when one variable is omitted

From   David Hoaglin <>
Subject   Re: st: Interaction terms interpretation when one variable is omitted
Date   Thu, 11 Apr 2013 20:46:33 -0400

Hi, Oliver.

If the rest of your data behave in the same way as the data for id 001
that you listed, then
_const - Employed - Treatment + (interaction term) = 0 (exactly).
That is the collinearity that caused Stata to omit the interaction term.

I suspect that I do not know enough about the detailed structure of
your data and your models, but it appears that the alternative
approach is not satisfactory.  The definition of each regression
coefficient includes the list of other predictors in the model.  When
you use the separate models, you need to understand what happens to
the constant term.  For example, when the model includes treat_emp,
but not treat_unemp or treat_ret, the individuals whose values on
treat_unemp or treat_ret are accounted for by the constant term, and
the coefficient of treat_emp would be interpreted as a comparison
between the individuals for whom treat_emp = 1 and the aggregate of
all other individuals.  It appears that treat_emp, treat_unemp, and
treat_ret are indicators for separate categories of a categorical
variable.  In such situations, all the categories except one should be
included in the model together.  (Omitting one category avoids a
collinearity.)  You may need to re-examine the definitions of your
predictor variables and make sure that they capture the intended

David Hoaglin

On Thu, Apr 11, 2013 at 7:14 AM, Mirnezami, Oliver
<> wrote:
> Hello
> I have a query regarding the interpretation of an interaction term when Stata automatically omits a  variable from the regression due to collinearity.
> I am looking at how job loss affects health and wish to extend my model to see when an individual loses their job, does re-employment moderate the negative effect on their health.
> To do this, I have interacted my treatment variable (1 for individuals that have reported job loss in current wave, 0 for individuals employed in current wave) with an individual's labour force status.
> For example:
> gen treat_employed = treat * employed
> gen treat_unemployed = treat * unemployed
> gen treat_retired = treat * retired
> In the first case, my regression is then (n.b. other controls are left out here for simplicity):
> xtreg health treat employed treat_employed, fe
> However, the interaction term treat_employed gets omitted. I then tried running the following regressions separately (with just 2 of 3 variables) and found that the coefficient and standard error on employed is the same as those of treat_employed (the interaction term):
>               |               Robust
>    health |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
> --------------+----------------------------------------------------------------
> treat |  -.0353416   .0370996    -0.95   0.341    -.1080636    .0373803
>        employed |   .1540951   .0679695     2.27   0.023     .0208624    .2873278
>         _cons |     3.4245   .0677945    50.51   0.000     3.291611     3.55739
>               |               Robust
>    sr_health1 |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
> --------------+----------------------------------------------------------------
> treat |  -.1894367   .0585036    -3.24   0.001    -.3041146   -.0747589
>     treat_employed |   .1540951   .0679695     2.27   0.023     .0208624    .2873278
>         _cons |   3.578596   .0007682  4658.40   0.000      3.57709    3.580101
> An example of my data is as follows:
> Id      Year    Employed        Treatment       Interaction term (employed * treatment)
> 001     1996          1                        0                                                        0
> 001     1998          1                        0                                                        0
> 001     2000          1                        0                                                        0
> 001     2002          0                        1                                                        0
> 001     2004          1                        0                                                        0
> 001     2006          1                        0                                                        0
> 001     2008          1                        1                                                        1
> 001     2010          1                        0                                                        0
> I think the problem is arising because employment and treatment are not independent of each other in the sense that treatment always equals  0 when employed equals 1 by construction (as my control group is people with a job) although when treatment equals 1 (i.e. an individual reports job loss in this wave), the individual can be employed or unemployed (or in fact any labour force status) because the job loss would have occurred at some point between this wave and the previous interview wave and so they have already found a new job. I wish to see if health is impacted depending on which labour force status an individual has following job loss.
> I thought of an alternate approach to the problem and would be grateful for your feedback. Originally, my treatment variable could equal 1 for any labour force status of the individual. My new method involves making separate treatment variables where the control groups are always the same but I have treat_emp which only equals 1 when the individual happens to be employed in the period in which job loss is reported and then treat_unemp or treat_ret if the individual happens to be unemployed or retired in the interview in which they report they have experienced job loss whereas originally it would equal 1 for all of these labour force statuses. My new method:
> local stubs "emp unemp ret"
> foreach stub of local stubs {
> gen treat_`stub' = .
> by id: replace treat_`stub'  = 0 if (treat ==0)
> by id: replace treat_`stub'  = 1 if (treat ==1 & `stub' ==1)
> }
> I then run a series of separate regressions and analyse the coefficient of the treatment variables separately. I found for example that the coefficient on treat_unemp is twice as large as treat_emp which makes intuitive sense to me - can I make these comparisons across regressions in this way when the regressions are exactly the same with just a different treatment variable included in each? My thought process is that in a sense, the original treatment variable is some kind of the average of the separate treatment variables whereas now I am examining each case separately to see how they differ across separate regressions.
> xtreg health treat_emp, fe
> xtreg health treat_unemp, fe
> xtreg health treat_ret, fe
> Is this alternate method acceptable to use? I'm just concerned because previously I have always been taught to use interaction terms.
> Incidentally, I found a query on interaction terms raised a few days ago by Nahla Betelmal very helpful as a starting point. David Hoaglin and Richard Williams generated a lot of discussion which was interesting to read although my query is specifically regarding when one of the variables is omitted which I don't think was covered specifically and whether my alternate approach is acceptable or should be disregarded?
> I would really appreciate any advice that you can offer. Apologies for the longwinded explanation.
> Kind regards
> Oliver

*   For searches and help try:

© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   Site index