Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: Situation where multiple imputation may be of no use?

From   Richard Williams <>
To, "" <>
Subject   Re: st: Situation where multiple imputation may be of no use?
Date   Thu, 09 Feb 2012 17:24:58 -0500

At 05:06 PM 2/9/2012, Clyde B Schechter wrote:
This is a question of a statistical nature about what multiple imputation can accomplish. I have used MI a few times, and I have a general understanding of how it works and the underlying theory, but not in great depth.

I'm working with a colleague to plan an experiment. This description is oversimplified but, I believe, provides the essence of it. Subjects will be enrolled and baseline data obtained. They will then be randomly assigned to intervention or placebo groups. After enough time for the intervention to work has elapsed, the outcome, a continuous variable, will be assessed, once and only once.

Based on some preliminary studies, we expect that about 15-20% of the participants will not return for the outcome assessment. Given our fairly small anticipated effect size (due mostly to noise in the outcome assessment that we can't think of any way to reduce with available technology), the sample size we need to adequately power our study is, as it turns out, about 20% greater than we will be able to manage within budget. So, if there were no losses to follow-up, we'd be just OK. But there will be losses to follow-up, and efforts to reduce that will also eat into the budget. (As would getting two outcome assessments and using the average or doing a mixed model.) So my colleague has suggested that when we analyze our data we use multiple imputation to make up for the missing data. I'm by no means opposed to doing that, but I don't think it will help us with regard to statistical power.

I understand that MI lets you squeeze all the information that is really there in the existing data set, and can even correct some of the bias that can result using listwise deletion. But in our case, the only missing data will be the outcome measurement. We will have complete data on everything else. So it seems to me, that MI in this context will just amount to carrying out a listwise-deletion analysis, and multiply extrapolating the results of that to the cases with missing outcome, and the combining the analyses of the imputed data sets in a way that reflects the between-imputed-samples variation. If I am thinking about this correctly, the added variance from the multiple imputations should pretty much balance the reduction in standard error that comes from (appearing to) use the full sample size. If this were not true, then MI would be synthesizing information ex nihilo. So, my instincts tell me that we will not solve our statistical power problem by using MI anal! ysis. I have run a few simulations, and they support my opinion, but I wanted to run this by some people who understand MI better than I do.

In general, I don't think you gain much by imputing values of the dependent variable. See

Excerpt: "One common question about imputation is whether the dependent variable should be included in the imputation model. The answer is yes, if the dependent variable is not included in the imputation model, the imputed values will not have the same relationship to the dependent variable that the observed values do. In other words, if the dependent variable is not included in the imputation model, you may be artificially reducing the strength of the relationship between the independent and dependent variables. After the imputations have been created, the issue of how to treat imputed values of the dependent variable becomes more nuanced. If the imputation model contains only those variables in the analysis model, then using the imputed values of the dependent variable does not provide additional information, and actually introduces additional error (von Hippel 2007). As a result some authors suggest including the dependent variable in the imputation model, which may include imputing values, and then excluding any cases with imputed values for the dependent variable from the final analysis (von Hippel 2007). If the imputation was performed using auxiliary variables or if the dataset was imputed without a specific analysis model in mind, then using the imputed values of the dependent variable may provide additional information. In these cases, it may be useful to include cases with imputed values of the dependent variable in the analysis model. Note that it is relatively easy to test the sensitivity of results to the inclusion of cases with imputed values of the dependent variable by running the analysis model with and without those cases."

A pre-publication version of the von Hippel paper is at

Richard Williams, Notre Dame Dept of Sociology
OFFICE: (574)631-6668, (574)631-6463
HOME:   (574)289-5227
EMAIL:  Richard.A.Williams.5@ND.Edu

*   For searches and help try:

© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index