Home  /  Resources & support  /  FAQs  /  Relation between official mi and community-contributed ice and mim commands

What is the relation between the official multiple-imputation command, mi, and the community-contributed ice and mim commands?

Title   Relation between official mi and community-contributed ice and mim commands
Authors Yulia Marchenko, StataCorp
Patrick Royston, MRC Clinical Trials Unit and University College London

Note: Because the ice community-contributed command is based upon random draws, results may differ on previous versions as a consequence of the 64-bit Mersenne Twister pseudorandom numbers, which was added to Stata since version 14.

Multiple-imputation analysis consists of three phases: 1) imputation—creating multiply imputed data, 2) completed data analysis of multiply imputed data, and 3) pooling of individual analyses from phase 2 using Rubin’s combination rules (Rubin 1987, 76).

Community-contributed commands uvis, ice (Royston 2005, 2007, 2009), and mim (Carlin, Galati, and Royston 2008; Royston, Carlin, and White 2009) are widely used to perform multiple-imputation analysis in Stata 9 and higher. uvis and ice perform phase 1. The uvis command performs univariate imputation. The ice command performs multivariate imputation via chained equations (van Buuren, Boshuizen, and Knook 1999). The mim command analyzes multiply imputed data by performing phases 2 and 3. mim also provides some capabilities for manipulating multiply imputed data.

On 27 July 2009, Stata 11 was released, bearing a major feature: the mi system for multiple imputation and estimation of models with multiply imputed data. The system comprises a new architecture for imputed datasets; commands for manipulating, checking, and validating such datasets; a command, mi impute, for doing imputation—phase 1; and a command, mi estimate, for combining estimation results using Rubin’s rules—phases 2 and 3. See the Multiple-Imputation Reference Manual (StataCorp 2023) for details. mi impute and mi estimate were expanded in Stata 12.

mi impute performs both univariate and multivariate imputation. There are nine univariate methods and three multivariate ones; please see the mi impute manual entry for a list. The nine univariate methods include two not available in uvis: Poisson and truncated normal imputation.

Multivariate imputation can be performed using mi impute monotone when the missingness pattern is monotone and using mi impute mvn or mi impute chained when the pattern is not monotone. mi impute monotone implements a noniterative imputation method based on a sequence of independent univariate conditional imputations (Rubin 1987, 170–186). It is similar to the implementation of the monotone option of the ice command. mi impute mvn performs multivariate imputation assuming that the data have a multivariate normal distribution. It implements the NORM method of Schafer (1997)—an iterative Markov chain Monte Carlo method (data augmentation) based on multivariate normality. The mi impute chained command implements an alternative iterative multivariate-imputation method based on a sequence of univariate full conditional specifications, also known as imputation via chained equations. mi impute chained was added in Stata 12 and uses the same method as implemented in the ice command.

mi impute chained and ice use the same imputation method, but their features are not the same. mi impute chained supports factor variables. ice includes stepwise model selection and is compatible with all releases since Stata 9. And if you have Stata 11 or more recent, you can use mi ice, a wrapper command for ice that understands the official mi data format. (mi ice is available from Patrick Royston’s web page under the heading mi_ice; in Stata, type net from http://www.homepages.ucl.ac.uk/~ucakjpr/stata.)

The official mi commands since Stata 12 cover all data-management and most estimation capabilities of mim; one exception is mim’s category(combine) option for combining arbitrary scalars. (See stata.com/support/faqs/statistics/combine-results-with-multiply-imputed-data for information on combining arbitrary scalars using mi estimate.) If you wish to use mim and have Stata 11 or more recent, you can use mim2, which understands the official mi data format. mim2 is available from the same website as mi ice.

The mi import ice and mi export ice commands make it easy to transport data between the existing ice/mim data format and the official mi data format.

Below we provide examples demonstrating how to switch between the mi and ice data formats. Because ice, mi ice, and mim are not part of official Stata, you should install them separately. You can use the search command to locate the desired package, and then follow the corresponding links for further instructions on installation.

Using mi import ice to import multiply imputed data created by ice into mi

In our examples, we use fictional data, mheart0.dta, recording heart attacks. The primary objective is to examine the relationship between heart attacks and smoking adjusted for other factors such as age, body mass index, gender, and educational status. The variable recording body mass index, bmi, contains missing values. Thus we use multiple imputation to analyze the heart attack data.


Using mi import ice to import multiply imputed data created by ice into mi

If you want to transport multiply imputed data obtained previously from ice to mi, use mi import ice.

For example, suppose you have multiply imputed data from ice and now want to perform data manipulation or analyze it using the mi command. We do not have such data, so we use ice to create it. We impute missing values of the bmi variable using ice to create five imputations and store them in a separate file, icedata.dta. We also set the random-number seed for reproducibility.

(Note: To run this example, you will need to install the community-contributed command, ice. You can obtain this command by typing ssc install ice in Stata.)

. webuse mheart0
(Fictional heart attack data; BMI missing)

. ice bmi attack smokes age female hsgrad, saving(icedata) m(5) seed(123)

#missing
values Freq. Percent Cum.
0 132 85.71 85.71
1 22 14.29 100.00
Total 154 100.00
Variable Command Prediction equation
attack [No missing data in estimation sample]
smokes [No missing data in estimation sample]
age [No missing data in estimation sample]
female [No missing data in estimation sample]
hsgrad [No missing data in estimation sample]
bmi regress attack smokes age female hsgrad
Imputing [Only 1 variable to be imputed, therefore no cycling needed] .1.2.3.4.5 file icedata.dta saved

We now load icedata.dta, containing multiply imputed data, into memory and use mi import ice to import data to mi. We use the automatic option of mi import ice to identify and register imputed variables automatically.

. use icedata, clear
(Fictional heart attack data; BMI missing)

. mi import ice, automatic
(22 m=0 obs now marked as incomplete)

We can now use any of the mi subcommands. For example, we can check characteristics of the imported mi data by using the mi describe command.

. mi describe

Style: flong
       last mi update 27may2021 13:16:36, approximately 1 minute ago

Observations:
Complete 132
Incomplete 22 (M = 5 imputations)
Total 154
Variables: Imputed: 1; bmi(22) Passive: 0 Regular: 0 System: 3; _mi_m _mi_id _mi_miss (there are 8 unregistered variables)

From the output above, we learn that our mi data are stored in the flong style and contain five imputations and one registered imputed variable—bmi. To conserve memory, we now choose to switch to the memory-efficient mi data storage style, mlong, by using mi convert.

. mi convert mlong

Next we analyze our multiply imputed data to examine the relationship between heart attacks and smoking adjusted for other factors using mi estimate: logit.

. mi estimate: logit attack smokes bmi age female hsgrad

Multiple-imputation estimates                   Imputations       =          5
Logistic regression                             Number of obs     =        154
                                                Average RVI       =     0.0298
                                                Largest FMI       =     0.1046
DF adjustment:   Large sample                   DF:     min       =     398.79
                                                        avg       =  18,342.11
                                                        max       =  48,184.53
Model F test:       Equal FMI                   F(   5,13096.8)   =       3.76
Within VCE type:          OIM                   Prob > F          =     0.0021

attack Coefficient Std. err. t P>|t| [95% conf. interval]
smokes 1.258327 .3629043 3.47 0.001 .5470296 1.969624
bmi .1143867 .0468168 2.44 0.015 .0223481 .2064253
age .0358312 .0155562 2.30 0.021 .0053394 .066323
female -.0827343 .4231973 -0.20 0.845 -.9123378 .7468693
hsgrad .1919585 .4065251 0.47 0.637 -.604842 .9887591
_cons -5.780003 1.683405 -3.43 0.001 -9.084615 -2.47539

It is only necessary to use mi import ice if you already have multiple imputations created by ice.


References

Carlin, J. B., J. C. Galati, and P. Royston. 2008.
A new framework for managing and analyzing multiply imputed data in Stata. Stata Journal 8: 49–67.
Royston, P. 2005.
Multiple imputation of missing values: Update of ice. Stata Journal 5: 527–536.
Royston, P. 2007.
Multiple imputation of missing values: Further update of ice, with an emphasis on interval censoring. Stata Journal 7: 445–464.
Royston, P. 2009.
Multiple imputation of missing values: Further update of ice, with an emphasis on categorical variables. Stata Journal 9: 466–477.
Royston, P., J. B. Carlin, and I. R. White. 2009.
Multiple imputation of missing values: New features for mim. Stata Journal 9: 252–264.
Rubin, D. B. 1987.
Multiple Imputation for Nonresponse in Surveys. New York: Wiley.
Schafer, J. L. 1997.
Analysis of Incomplete Multivariate Data. Boca Raton, FL: Chapman & Hall/CRC.
StataCorp. 2023.
Stata 18 Multiple-Imputation Reference Manual. College Station, TX: Stata Press.
van Buuren, S., H. C. Boshuizen, and D. L. Knook. 1999.
Multiple imputation of missing blood pressure covariates in survival analysis. Statistics in Medicine 18: 681–694.