[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

st: ice command question about interactions

From   Maarten buis <[email protected]>
To   stata list <[email protected]>
Subject   st: ice command question about interactions
Date   Fri, 15 Jan 2010 10:08:46 -0800 (PST)

A while ago Alan Acock asked a question on interactions in an imputation model: <>.
The main issue was that there is an emerging literature claiming that one
should not use the -passive()- option in -ice- (see, -ssc d ice-), but 
instead create interactions, squares etc. in the un-imputed data, and impute these as if they were normal variables:

John Graham (2009) "Missing Data Analysis: Making it Work in the Real World", Annual Review of Psychology, 60:549-576.

Paul von Hippel (2009) "How to impute interactions, squares, and other transformed variables", Sociological Methodology, 39:265-291.

Alan wrote:
> ice allows us to passively estimate an interaction term by estimating 
> the main effects and then multiplying these together so the interaction 
> of X&Y will be the imputed X times the imputed Y. This seems necessary 
> to preserve the interpretation of the interaction.
> Graham says we need to include the interaction term. "The problem with 
> excluding such variables from the imputation model is that all 
> imputation is done under the assumption that the correlation is r = 0 
> between the omitted variable and all other variables in the imputation." 
> This is the same argument that Graham makes for imputing the dependent 
> variable in the imputation (a sensible thing to do).
> I understand the importance of including the dependent variable when 
> doing multiple imputations, and see how Graham could apply this to the 
> interaction term, but it makes no sense to me to have an interaction of 
> X and Y not equal X*Y. 

Paul Allison responded:
> Graham is right. In multiple imputation, interactions should be imputed 
> as though they are additional variables, not constructed by multiplying 
> imputed values. The same is true if you have x and x^2 in a model.  The 
> x^2 term should be imputed just like any other variable, not constructed 
> by squaring the imputed values of x. While this principle may seem 
> counterintuitive, it is easily demonstrated by simulation that the more 
> "natural" way to do it produces biased estimates.  

I was skeptical and tried to do that simulation. I did not have much time, 
and I did not get the simulation right. I still posted it, in case my 
first attemp at a solution might be helpful to someone.

Right now I am about to start a new imputation project, so I thought it 
was time to take this subject on again. I rewrote the simulation and ran it. This time the results seem more reasonable. It supported the claim by 
von Hippel and Graham, and showed -passive()- really seems to introduce 
some bias, and that first transforming and than imputing really reduces 
it. The true interaction effect was 1, with -passive()- it had a bias of  
-.14  (MC standard error = .0007), while the bias reduced to -.007 (MC 
standard error = .0002) without -passive()-. 

To run this simulation one needs: 1) a couple of hours, 2) -ice- (ssc install ice-), 3) -mim- (ssc install mim-), 4) -simsum- (see this talk at the last UK Stata Users' meeting: <>)

*-------------------- begin simulation -----------------------
set more off
program drop _all
program define sim, rclass
	drop _all
	matrix C = (1, .25, .25 \ .25, 1, .25 \ .25, .25, 1)
	drawnorm x1 x2 x3, n(250) corr(C)
	gen x12= x1*x2
	gen y = x1 + x2 + x3 + x12 + .25*rnormal()
	replace x1 = . if runiform() < invlogit(-2 - y + x3) 
	replace x2 = . if runiform() < invlogit(-2 - y + x3)
	ice y x1 x2 x3 x12, m(5) clear passive(x12:x1*x2)
	mim, storebv : reg y x1 x2 x3 x12
	return scalar b = _b[x1]
	return scalar se = _se[x1]
	return scalar b12 = _b[x12]
	return scalar se12 = _se[x12]
	keep if _mj ==0
	drop _m*
	ice y x1 x2 x3 x12, m(5) clear 
	mim, storebv : reg y x1 x2 x3 x12
	return scalar hb = _b[x1]
	return scalar hse = _se[x1]
	return scalar hb12 = _b[x12]
	return scalar hse12 = _se[x12]
timer clear 1
timer on 1
simulate b=r(b) se=r(se) b12=r(b12) se12=r(se12) ///
         hb=r(hb) hse=r(hse) hb12=r(hb12) hse12 = r(hse12), ///
		 reps(10000) : sim
timer off 1
timer list
simsum b hb, true(1) se(se hse) mcse
simsum b12 hb12 , true(1) se(se12 hse12) mcse
*------------------------- end simulation ------------------------
( For more on how to use examples I sent to statalist see: )

Hope this helps,

Maarten L. Buis
Institut fuer Soziologie
Universitaet Tuebingen
Wilhelmstrasse 36
72074 Tuebingen


*   For searches and help try:

© Copyright 1996–2024 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index