[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]
st: RE: Question about multiple imputation in Stata
Sometime back in 2004, Michael Ingre reported to the list thus:
I have been asked to inform interested Statalist members that Patrick
Royston has submitted a set of .ado for Multiple Imputation (MI) and an
accompanying article to the Stata Journal.
The procedures implement a "switching regression" technique previously
described by van Buuren et al (1999).
Patrick Royston has been kind to let me try the software before it is
publicly announced and to pass it on to interested members of the list. Just
let me know and I will e-mail you a copy.
I would also want to express my gratitude to Nick Cox for introducing me to
Patrick and his work.
van Buuren S., H. C. Boshuizen and D. L. Knook. 1999. Multiple imputation of
missing blood pressure covariates in survival analysis. Statistics in
A few current references of interest:
(1) Briggs A, Clark T, Wolstenholme J, et al.
Missing....presumed at random: Cost-analysis of incomplete data. Health
Economics 12:377-392, 2003
(2) Clark TG, Altman DG. Developing a prognostic model in the
presence of missing data: an ovarian cancer case study. J Clin Epidemiol
(3) Engels JM, Diehr P. Imputation of missing longitudinal data:
a comparison of methods. J Clin Epidemiol 56:968-976, 2003
(4) Horton NJ, Lipsitz SR. Multiple imputation in practice:
Comparison of software packages for regression models with missing values.
The American Statistician 55:244-254, 2001
In both the vanBuuren and Clark papers, there was "little" difference in
estimates with MI, a point that drew comment in the Discussion section of
With respect to particular softwares, there did not seem to be much to
recommend; if you are happy with a (multivariate) normality assumption, then
NORM is available as free-ware (Windows) from Joe Schafer's site
(http://www.stat.psu.edu/~jls/misoftwa.html#win). The S-Plus MI library
(version 6.x)incorporates Shafer's routines; Clark and Altman used the MICE
As indicated on Joe Schafer's site, PAN is being (currently) updated to
stand alone Windows status and, as I understand, in updated form from the
original (and now obsolete) S-Plus implementations.
Of some importance, is the question of levels of analysis; this surfaced
recently on the IMPUTE list in January of this year (archives at
I would be more than interested in other opinions on this matter; more
especially, the question of imputation in a survival analysis, albeit this
is not a specific Stata problem.
From: Shige Song [mailto:email@example.com]
Sent: Sunday, April 25, 2004 2:35 PM
Subject: st: Question about multiple imputation in Stata
I am trying to estimate a discrete time hazard with some heavy missing
values (around 20% of values are missing for some covariates) in some of the
covariates. Multiple imputation seems to be the best way to go in this
situation. I am glad to find the set of ado files contributed by Carlin at
al (2003) to facilitate analyzing multiply imputed data sets in Stata. Now
the question left is: what is the optimal way to obtain these multiply
imputed data sets?
My understanding is that the built-in command "impute" and user-contributed
"hotdeck" do not do multiple imputation. I am not quite sure about
"whotdeck". Other options involves learning to use other statistical
packages (NORM, CAT, MIX in Splus and R; MI in SAS). I am new to this filed
and was wondering whether other more experienced researchers are willing to
share their knowledge on how to dealing with similar problems.
Thanks you very much!
Department of Sociology, UCLA
* For searches and help try:
* For searches and help try: