Bookmark and Share

Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down at the end of May, and its replacement, statalist.org is already up and running.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

st: Interaction terms interpretation when one variable is omitted


From   "Mirnezami, Oliver" <O.Y.Mirnezami@warwick.ac.uk>
To   "statalist@hsphsun2.harvard.edu" <statalist@hsphsun2.harvard.edu>
Subject   st: Interaction terms interpretation when one variable is omitted
Date   Thu, 11 Apr 2013 11:14:06 +0000

Hello

I have a query regarding the interpretation of an interaction term when Stata automatically omits a  variable from the regression due to collinearity. 

I am looking at how job loss affects health and wish to extend my model to see when an individual loses their job, does re-employment moderate the negative effect on their health.

To do this, I have interacted my treatment variable (1 for individuals that have reported job loss in current wave, 0 for individuals employed in current wave) with an individual's labour force status. 

For example:

gen treat_employed = treat * employed
gen treat_unemployed = treat * unemployed
gen treat_retired = treat * retired 

In the first case, my regression is then (n.b. other controls are left out here for simplicity): 

xtreg health treat employed treat_employed, fe

However, the interaction term treat_employed gets omitted. I then tried running the following regressions separately (with just 2 of 3 variables) and found that the coefficient and standard error on employed is the same as those of treat_employed (the interaction term):


              |               Robust
   health |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
--------------+----------------------------------------------------------------
treat |  -.0353416   .0370996    -0.95   0.341    -.1080636    .0373803
       employed |   .1540951   .0679695     2.27   0.023     .0208624    .2873278
        _cons |     3.4245   .0677945    50.51   0.000     3.291611     3.55739

              |               Robust
   sr_health1 |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
--------------+----------------------------------------------------------------
treat |  -.1894367   .0585036    -3.24   0.001    -.3041146   -.0747589
    treat_employed |   .1540951   .0679695     2.27   0.023     .0208624    .2873278
        _cons |   3.578596   .0007682  4658.40   0.000      3.57709    3.580101

An example of my data is as follows:

Id	Year	Employed	Treatment	Interaction term (employed * treatment)
001	1996	      1	                       0	                                                0
001	1998	      1	                       0	                                                0
001	2000	      1	                       0	                                                0
001	2002	      0	                       1	                                                0
001	2004	      1	                       0	                                                0
001	2006	      1	                       0	                                                0
001	2008	      1	                       1	                                                1
001	2010	      1	                       0	                                                0

I think the problem is arising because employment and treatment are not independent of each other in the sense that treatment always equals  0 when employed equals 1 by construction (as my control group is people with a job) although when treatment equals 1 (i.e. an individual reports job loss in this wave), the individual can be employed or unemployed (or in fact any labour force status) because the job loss would have occurred at some point between this wave and the previous interview wave and so they have already found a new job. I wish to see if health is impacted depending on which labour force status an individual has following job loss. 

I thought of an alternate approach to the problem and would be grateful for your feedback. Originally, my treatment variable could equal 1 for any labour force status of the individual. My new method involves making separate treatment variables where the control groups are always the same but I have treat_emp which only equals 1 when the individual happens to be employed in the period in which job loss is reported and then treat_unemp or treat_ret if the individual happens to be unemployed or retired in the interview in which they report they have experienced job loss whereas originally it would equal 1 for all of these labour force statuses. My new method:

local stubs "emp unemp ret"
foreach stub of local stubs {
gen treat_`stub' = .
by id: replace treat_`stub'  = 0 if (treat ==0)
by id: replace treat_`stub'  = 1 if (treat ==1 & `stub' ==1)
}

I then run a series of separate regressions and analyse the coefficient of the treatment variables separately. I found for example that the coefficient on treat_unemp is twice as large as treat_emp which makes intuitive sense to me - can I make these comparisons across regressions in this way when the regressions are exactly the same with just a different treatment variable included in each? My thought process is that in a sense, the original treatment variable is some kind of the average of the separate treatment variables whereas now I am examining each case separately to see how they differ across separate regressions.  

xtreg health treat_emp, fe
xtreg health treat_unemp, fe
xtreg health treat_ret, fe

Is this alternate method acceptable to use? I'm just concerned because previously I have always been taught to use interaction terms. 

Incidentally, I found a query on interaction terms raised a few days ago by Nahla Betelmal very helpful as a starting point. David Hoaglin and Richard Williams generated a lot of discussion which was interesting to read although my query is specifically regarding when one of the variables is omitted which I don't think was covered specifically and whether my alternate approach is acceptable or should be disregarded?  

I would really appreciate any advice that you can offer. Apologies for the longwinded explanation. 

Kind regards

Oliver


*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   Site index