Stata: Data Analysis and Statistical Software

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: Situation where multiple imputation may be of no use?

From	Richard Williams <[email protected]>
To	[email protected], "[email protected]" <[email protected]>
Subject	Re: st: Situation where multiple imputation may be of no use?
Date	Thu, 09 Feb 2012 17:24:58 -0500

At 05:06 PM 2/9/2012, Clyde B Schechter wrote:

This is a question of a statistical nature about what multipleimputation can accomplish.I have used MI a few times, and I have a general understanding ofhow it works and the underlying theory, but not in great depth.
I'm working with a colleague to plan an experiment. Thisdescription is oversimplified but, I believe, provides the essenceof it. Subjects will be enrolled and baseline data obtained. Theywill then be randomly assigned to intervention or placebogroups. After enough time for the intervention to work has elapsed,the outcome, a continuous variable, will be assessed, once and only once.
Based on some preliminary studies, we expect that about 15-20% ofthe participants will not return for the outcome assessment. Givenour fairly small anticipated effect size (due mostly to noise in theoutcome assessment that we can't think of any way to reduce withavailable technology), the sample size we need to adequately powerour study is, as it turns out, about 20% greater than we will beable to manage within budget. So, if there were no losses tofollow-up, we'd be just OK. But there will be losses to follow-up,and efforts to reduce that will also eat into the budget. (As wouldgetting two outcome assessments and using the average or doing amixed model.) So my colleague has suggested that when we analyzeour data we use multiple imputation to make up for the missingdata. I'm by no means opposed to doing that, but I don't think itwill help us with regard to statistical power.
I understand that MI lets you squeeze all the information that isreally there in the existing data set, and can even correct some ofthe bias that can result using listwise deletion. But in our case,the only missing data will be the outcome measurement. We will havecomplete data on everything else. So it seems to me, that MI inthis context will just amount to carrying out a listwise-deletionanalysis, and multiply extrapolating the results of that to thecases with missing outcome, and the combining the analyses of theimputed data sets in a way that reflects the between-imputed-samplesvariation. If I am thinking about this correctly, the addedvariance from the multiple imputations should pretty much balancethe reduction in standard error that comes from (appearing to) usethe full sample size. If this were not true, then MI would besynthesizing information ex nihilo. So, my instincts tell me thatwe will not solve our statistical power problem by using MI anal!ysis. I have run a few simulations, and they support my opinion,but I wanted to run this by some people who understand MI better than I do.

In general, I don't think you gain much by imputing values of thedependent variable. See


http://www.ats.ucla.edu/stat/stata/seminars/missing_data/mi_in_stata_pt1.htm

Excerpt: "One common question about imputation is whether thedependent variable should be included in the imputation model. Theanswer is yes, if the dependent variable is not included in theimputation model, the imputed values will not have the samerelationship to the dependent variable that the observed values do.In other words, if the dependent variable is not included in theimputation model, you may be artificially reducing the strength ofthe relationship between the independent and dependent variables.After the imputations have been created, the issue of how to treatimputed values of the dependent variable becomes more nuanced. If theimputation model contains only those variables in the analysis model,then using the imputed values of the dependent variable does notprovide additional information, and actually introduces additionalerror (von Hippel 2007). As a result some authors suggest includingthe dependent variable in the imputation model, which may includeimputing values, and then excluding any cases with imputed values forthe dependent variable from the final analysis (von Hippel 2007). Ifthe imputation was performed using auxiliary variables or if thedataset was imputed without a specific analysis model in mind, thenusing the imputed values of the dependent variable may provideadditional information. In these cases, it may be useful to includecases with imputed values of the dependent variable in the analysismodel. Note that it is relatively easy to test the sensitivity ofresults to the inclusion of cases with imputed values of thedependent variable by running the analysis model with and without those cases."


A pre-publication version of the von Hippel paper is at

http://www.sociology.ohio-state.edu/people/ptv/publications/Missing%20Y/accepted.pdf


-------------------------------------------
Richard Williams, Notre Dame Dept of Sociology
OFFICE: (574)631-6668, (574)631-6463
HOME:   (574)289-5227
EMAIL:  [email protected]
WWW:    http://www.nd.edu/~rwilliam

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

References:
- st: Situation where multiple imputation may be of no use?
  - From: Clyde B Schechter <[email protected]>

Prev by Date: Re: st: MIXLOGIT: marginal effects
Next by Date: st: Re: Merge problem; missing observation
Previous by thread: st: Situation where multiple imputation may be of no use?
Next by thread: Re: Re: st: Situation where multiple imputation may be of no use?
Index(es):
- Date
- Thread