What is the relation between the official multiple-imputation command,
mi, and the user-written ice and mim commands?
| Title |
|
Relation between official mi and user-written ice and mim commands |
| Authors |
Yulia Marchenko, StataCorp
Patrick Royston, MRC Clinical Trials Unit and University College London |
| Date |
October 2009; updated July 2011 |
Multiple-imputation analysis consists of three phases: 1)
imputation—creating multiply imputed data, 2) completed data analysis
of multiply imputed data, and 3) pooling of individual analyses from phase
2 using Rubin’s combination rules (Rubin 1987, 76).
User-written commands uvis, ice (Royston 2005, 2007,
2009), and mim (Carlin, Galati, and Royston 2008; Royston, Carlin,
and White 2009) are widely used to perform multiple-imputation analysis in
Stata 9 and higher. uvis and ice perform phase 1. The
uvis command performs univariate imputation. The ice command
performs multivariate imputation via chained equations (van Buuren, Boshuizen, and Knook 1999). The mim command analyzes multiply imputed data by performing
phases 2 and 3. mim also provides some capabilities for
manipulating multiply imputed data.
On 27 July 2009, Stata 11 was released, bearing a major new feature: the
mi system for
multiple imputation and estimation of models with multiply imputed data.
The system comprises a new architecture for imputed datasets; commands for
manipulating, checking, and validating such datasets; a command,
mi
impute, for doing imputation—phase 1; and a command, mi
estimate, for combining estimation results using Rubin’s
rules—phases 2 and 3. See the Multiple-Imputation
Reference Manual (StataCorp 2011) for details. mi impute
and mi estimate were expanded in Stata 12.
mi impute performs both univariate and multivariate imputation.
There are nine univariate methods and three multivariate ones; please see
the mi impute
help file for a list. The nine univariate methods include two not available
in uvis: Poisson and truncated normal imputation.
Multivariate imputation can be performed using mi impute monotone
when the missingness pattern is monotone and using mi impute mvn or
mi impute chained when the pattern is not monotone. mi impute
monotone implements a noniterative imputation method based on a sequence
of independent univariate conditional imputations (Rubin 1987,
170–186). It is similar to the implementation of the monotone
option of the ice command. mi impute mvn performs
multivariate imputation assuming that the data have a multivariate normal
distribution. It implements the NORM method of Schafer (1997)—an
iterative Markov chain Monte Carlo method (data augmentation) based on
multivariate normality. The mi impute chained command implements an
alternative iterative multivariate-imputation method based on a sequence of
univariate full conditional specifications, also known as imputation via
chained equations. mi impute chained was added in Stata 12 and uses
the same method as implemented in the ice command.
mi impute chained and ice use the same imputation method, but
their features are not the same. mi impute chained supports
factor variables.
ice includes stepwise model selection and is compatible with all
releases since Stata 9. And if you have Stata 11 or 12, you can use mi
ice, a wrapper command for ice that understands the official
mi data format. (mi ice is available from Patrick
Royston’s web page under the heading mi_ice; in Stata, type
net from http://www.homepages.ucl.ac.uk/~ucakjpr/stata.)
The official mi commands in Stata 12 cover all data-management and
most estimation capabilities of mim; one exception is mim’s
category(combine) option for combining arbitrary scalars. (See
stata.com/support/faqs/statistics/combine-results-with-multiply-imputed-data for information
on combining arbitrary scalars using mi estimate.) If you wish to use
mim and have Stata 11 or 12, you can use mim2, which
understands the official mi data format. mim2 is available
from the same website as mi ice.
The mi import ice and mi export ice commands make it easy to
transport data between the existing ice/mim data format and
the official mi data format.
Below we provide examples demonstrating how to switch between the mi
and ice data formats. Because ice, mi ice,
and mim are not part of official Stata, you should install them
separately. You can use the
findit
command to locate the desired package, and then follow the corresponding
links for further instructions on installation.
Using mi import ice to import multiply imputed data
created by ice into mi
In our examples, we use fictional data, mheart0.dta, recording heart
attacks. The primary objective is to examine the relationship between heart
attacks and smoking adjusted for other factors such as age, body mass index,
gender, and educational status. The variable recording body mass index,
bmi, contains missing values. Thus we use multiple imputation to
analyze the heart attack data.
Using mi import ice to import multiply imputed data
created by ice into mi
If you want to transport multiply imputed data obtained previously from
ice to mi, use mi import ice.
For example, suppose you have multiply imputed data from ice and
now want to perform data manipulation or analyze it using the mi
command. We do not have such data, so we use ice to create it. We
impute missing values of the bmi variable using ice to create
five imputations and store them in a separate file, icedata.dta. We
also set the random-number seed for reproducibility.
. webuse mheart0
(Fictional heart attack data; bmi missing)
. ice bmi attack smokes age female hsgrad, saving(icedata) m(5) seed(123)
#missing |
values | Freq. Percent Cum.
------------+-----------------------------------
0 | 132 85.71 85.71
1 | 22 14.29 100.00
------------+-----------------------------------
Total | 154 100.00
Variable | Command | Prediction equation
------------+---------+-------------------------------------------------------
attack | | [No missing data in estimation sample]
smokes | | [No missing data in estimation sample]
age | | [No missing data in estimation sample]
female | | [No missing data in estimation sample]
hsgrad | | [No missing data in estimation sample]
bmi | regress | attack smokes age female hsgrad
------------------------------------------------------------------------------
Imputing
[Only 1 variable to be imputed, therefore no cycling needed]
.1.2.3.4.5
file icedata.dta saved
We now load icedata.dta, containing multiply imputed data, into
memory and use mi import ice to import data to mi. We use the
automatic option of mi import ice to identify and register
imputed variables automatically.
. use icedata, clear
(Fictional heart attack data; bmi missing)
. mi import ice, automatic
(22 m=0 obs. now marked as incomplete)
We can now use any of the mi subcommands. For example, we can check
characteristics of the imported mi data by using the mi
describe command.
. mi describe
Style: flong
last mi update 25jul2011 10:44:51, 0 seconds ago
Obs.: complete 132
incomplete 22 (M = 5 imputations)
---------------------
total 154
Vars.: imputed: 1; bmi(22)
passive: 0
regular: 0
system: 3; _mi_m _mi_id _mi_miss
(there are 8 unregistered variables)
From the output above, we learn that our mi data are stored in the
flong style and contain five imputations and one registered imputed
variable—bmi. To conserve memory, we now choose to switch to
the memory-efficient mi data storage style, mlong, by using
mi convert.
. mi convert mlong
Next we analyze our multiply imputed data to examine the relationship
between heart attacks and smoking adjusted for other factors using mi
estimate: logit.
. mi estimate: logit attack smokes bmi age female hsgrad
Multiple-imputation estimates Imputations = 5
Logistic regression Number of obs = 154
Average RVI = 0.0248
Largest FMI = 0.1155
DF adjustment: Large sample DF: min = 329.80
avg = 125100.34
max = 447329.84
Model F test: Equal FMI F( 5,16477.7) = 3.44
Within VCE type: OIM Prob > F = 0.0041
------------------------------------------------------------------------------
attack | Coef. Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
smokes | 1.180581 .354507 3.33 0.001 .4857551 1.875406
bmi | .0914414 .0472251 1.94 0.054 -.001459 .1843418
age | .0348427 .0153231 2.27 0.023 .0048094 .064876
female | -.1397504 .4148719 -0.34 0.736 -.9529011 .6734004
hsgrad | .148727 .4010005 0.37 0.711 -.6372217 .9346757
_cons | -5.076543 1.652779 -3.07 0.002 -8.321277 -1.83181
------------------------------------------------------------------------------
It is only necessary to use mi import ice if you already have
multiple imputations created by ice.
References
- Carlin, J. B., J. C. Galati, and P. Royston. 2008.
- A new framework for managing and analyzing multiply imputed data in Stata.
Stata Journal 8: 49–67.
- Royston, P. 2005.
- Multiple imputation of missing values: Update of ice. Stata Journal
5: 527–536.
- Royston, P. 2007.
- Multiple imputation of missing values: Further update of ice, with an emphasis on interval censoring. Stata Journal 7: 445–464.
- Royston, P. 2009.
- Multiple imputation of missing values: Further update of ice, with an
emphasis on categorical variables. Stata Journal 9: 466–477.
- Royston, P., J. B. Carlin, and I. R. White. 2009.
- Multiple imputation of missing values: New features for mim.
Stata Journal 9: 252–264.
- Rubin, D. B. 1987.
- Multiple Imputation for Nonresponse in Surveys. New York: Wiley.
- Schafer, J. L. 1997.
- Analysis of Incomplete Multivariate Data. Boca Raton, FL:
Chapman & Hall/CRC.
- StataCorp. 2011.
- Stata 12 Multiple-Imputation Reference Manual. College Station, TX: Stata Press.
- van Buuren, S., H. C. Boshuizen, and D. L. Knook. 1999.
- Multiple imputation of missing blood pressure covariates in survival
analysis. Statistics in Medicine 18: 681–694.
|
FAQs
What's new?
Statistics
Data management
Graphics
Programming Stata
Mata
Resources
Internet capabilities
Stata for Windows
Stata for Unix
Stata for Mac
Technical support
|