|  | 
[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]
st: Re: When to impute  - and an alternative
- --- David Airey <[email protected]> wrote:
> I have trouble understanding the translation of these three missing
> situations into when it is useful to impute.
The three situations are MCAR (missing completely at random)
MAR (missing at random) and NMAR (non-missing at random).
Analysis of complete data only can be biassed for MAR & NMAR.
Imputation is unnecessary with MCAR.
Here's a very practical approach :
1) Build a regression model to predict who is likely to go missing,
using the predictors you would use multiple imputation?
Is it reasonably powerful?
If not,  there is no point in imputing. Your data is probably MCAR.
It is not MAR.  Imputing will not help.
2) Calculate a prediction score from the logistic regression
Now, compare this score with the (non-missing) outcomes.
If there is no relationship, there is no correctable bias.
If you data passes tests 1) and 2), imputation is probably called for,
as MAR is a possibility.  However, NMAR remains an issue.
3) Consider if there could be an unobserved process
causing people with extreme values of the outcome to go missing.
If you (and you non-statistical collaborators) judge this to be this
is implausible, your data is probably not NMAR.
Either way you should mention the possibility of NMAR
and the size & direction of any likely bias caused in the
discussion section of the paper.
A very interesting new paper on this subject is
Diggle, Fairwell & Henderson
Analysis of longitudinal data with dropout: objectives, assumptions 
and a proposal.
Appl. Statis (2007) 56 (5) 499-550 (with discussion).
As the title implies, it contains a new method,
based on martingale assumptions and difference
scores. Their method is unbiassed under
MAR and under certain version of NMAR
(when the martingale assumptions are valid) &
is therefore superior to multiple imputation.
They claim the method is very easy to implement using
standard software, and they give 4 lines of S-PLUS:
        fit <- lmList(PANSS~ tesat|time, data = schizophrenioa, pool=F)
        apply(coef(fit),2, cumsum)
        SEs <- sumary(fit)$coef[,"Std. Error",]
        apply(SEs^2,2,cumcum)
If anyone is familiar with R or S PLUS, and in particular with the lmList
command from the -nlme- package (Pinhero & Bates 2000,
"Mixed effect models in S and S PLUS", NY, Springer),
and could translate these 4 lines into Stata, they would be doing
a great favour to the Stata community.
==========================
Paul T Seed MSc CStat
Senior Lecturer in Medical Statistics
King's College London
Division of Reproduction and Endocrinology
St Thomas' Hospital,
Lambeth Palace Road,
London SE1 7EH
tel  (+44) (0) 20 7188 3642
fax (+44) (0) 20 7620 1227
*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/