Stata The Stata listserver
[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

st: -xi- and interactions (was:your stata query)


From   "Laplante, Benoît" <Benoit_Laplante@UCS.INRS.Ca>
To   <statalist@hsphsun2.harvard.edu>
Subject   st: -xi- and interactions (was:your stata query)
Date   Thu, 16 Jun 2005 16:21:57 -0400

The recent postings on -xi- reminded me of the following riddle about interactions.
 
Say you have two variables, T and A. Each has two values, low and high, coded 0 and 1 respectively and for both variables. You assume that the effect of A varies across the values of T, which is a very basic definition of what an interaction is about. There are (at least) two ways to deal with the problem.
 
The first one is simply to build dummies that represent all the combinations of values of the two variables and put all of them, minus one, in the equation. So you would have T0A0, T0A1, T1A0 and T1A1. The most obvious choice would be to exclude T0A0, but excluding any of the four variables would provide equivalent results. So you would use three variables to represent your interaction, say T0A1, T1A0 and T1A1.
 
Another method, the one used by -xi-is to compute cross products of the two variables. The procedure gives you an equation in which you are using three variables, two of them labelled as the original variables and the third one as their product. So you are using variables labelled T, A and TA. Given the original coding scheme and the use of a cross-product, T is actually equivalent to T1A0, A to T0A1 and TA to T1A1-(T0A1+T1A0). 
 
Both methods will produce equivalent results.
 
Now let's say that you are adding a second variable to your equation, B, whose effect is also assumed to vary across the values of T. Using the first method, you would build an equation containing 6 terms, that is three out of T0A0, T0A1, T1A0 and T1B1, and three out of T0B0, T0B1, T1B0 and T1B1.
 
If you were to use the second method, you would be using 5 terms: T, A, TA, B, TB. The fit of the two models are not the same and I never found a reference that dealt with how two such different models could be equivalent. 

Actually, the second one seems to be equivalent to an equation in which, using T0A0 and T0B0 as reference categories, T=(T1A0+T1B0)/2, which would be an unwanted assumption in most cases. 
 
Anyone has an answer?
 
Benoît Laplante, professeur
Université du Québec
Institut National de la Recherche Scientifique
Urbanisation, Culture et Société
http://www.inrs-ucs.uquebec.ca/default.asp?p=lapl



*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index