Bookmark and Share

Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down on April 23, and its replacement, statalist.org is already up and running.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Re: Re: st: Situation where multiple imputation may be of no use?


From   Stas Kolenikov <skolenik@gmail.com>
To   statalist@hsphsun2.harvard.edu
Subject   Re: Re: Re: st: Situation where multiple imputation may be of no use?
Date   Wed, 15 Feb 2012 09:40:50 -0500

On Tue, Feb 14, 2012 at 3:19 PM, Clyde B Schechter
<clyde.schechter@einstein.yu.edu> wrote:
> Let me re-summarize the initial problem.  There are no real data from this study yet: we are drawing up a study proposal for a two-arm randomized trial and stumbling over whether we can recruit an adequate sample with the available funding.  Our power calculations tell us that if there were going to be no missing data, the budget would just cover us.  But similar studies in the past have led to about 15-20% of the participants being lost to follow-up.  There aren't enough funds to recruit a larger sample in anticipation of this.  So, a colleague suggested that using multiple imputation or FIML in our final analysis would solve our problem, that our sample with missing data would be sufficient.  My major question is whether this is true: whether the use of MI and FIML would permit us to plan for a sample size that would be adequate with no missing data (but too small with missing data.)
>
> A feature of our study design is that all data is gathered at the time of recruitment except for the outcome, which must be delayed so the intervention has time to work.  We will have missing data only on the outcome--all other variables will be complete.  The other variables, by the way, are mainly for descriptive interest: we do not expect any of them to be of value as ancillary predictors of the outcome.  In fact, really, the only predictor variable in our study is the randomization assignment.
>
> My instincts tell me that in this situation, the use of MI or FIML will not really help because the cases that are missing outcome will not provide any additional information about the coefficient of the study arm indicator.  If they did, the information would seem to come from nowhere at all!  In addition, I have done some simulations looking at statistical power for this design using both MI and FIML, and they appear to be no better in this respect than complete case analysis when I simulate data being MCAR.  In addition, when I simulate the data as being MNAR using a missingness model that I think is plausible for our situation, I also find that the use of MI and FIML do not provide any bias correction compared to complete case analysis.
>
> I'm fairly satisfied at this point that MI and FIML won't help us in this specific situation.  I do appreciate the comments that Cameron McIntosh and Richard Williams have made--they have clarified my thinking about the matter.

It is unusual that MAR and MCAR led to the same results (although if
you generated the outcome as independent of the covariates except for
treatment, that's how it should be, indeed). The worst cases of NMAR I
can think of is when the non-response is concentrated in a tail of the
distribution of one of the arms. For instance, suppose treatment has
no effect, so both control and treatment group follow N(0,1). However,
those treated who had become worse off after the treatment shy away
from reporting it. As a result, your observed distribution of the
outcome after treatment is a truncated normal, with mean greater than
zero. This could lead you to a false belief about efficacy of the
treatment. You can devise other situations: the treatment providing a
modest effect size of 0.5, non-respondents being the top performers
under treatment -> the observed distribution of outcomes in the
treatment arm would have a mean below 0.5, may be close to 0. These
extremes will give you the greatest biases, and this is what you might
want to simulate to get plausible bounds on your estimates. If you do
your imputations naively, you would just reinforce the appearances of
these biased sample means.

-- 
Stas Kolenikov, also found at http://stas.kolenikov.name
Small print: I use this email account for mailing lists only.

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   Site index