Home  /  Products  /  Stata 12  /  What's new in multiple imputation

Updates to multiple imputation were introduced in Stata 12.

See the latest version of multiple imputation. See all of Stata's multiple imputation features.

See the new features in Stata 18.

Order

What’s new in multiple imputation

New imputation features

  • The mi impute command now supports multivariate imputation using chained equations (ICE), mi impute chained, also known as sequential regression multivariate imputation (SRMI).

    ICE is a flexible imputation technique for imputing various types of data. The variable-by-variable specification of ICE allows you to impute variables of different types by choosing from several univariate imputation methods the appropriate one for each variable. Variables can have an arbitrary missing-data pattern. By specifying a separate model for each variable, you can incorporate certain important characteristics, such as ranges and restrictions within a subset, specific to each variable.



    Use any of nine univariate imputation methods to build a flexible imputation model.

    Customize prediction equations for imputed variables (for example, omit hsgrad from the model for bmi).

    Impute some variables using conditional imputation.

    Allow general expressions of imputed variables in the equations of other imputed variables (such as include bmi^2 in age’s imputation model).
  • There are four new univariate imputation methods that can also be used as building blocks for multivariate imputation using the monotone method or the new chained-equations method.

    • mi impute truncreg imputes continuous variables with a restricted range using the truncated regression method
    • mi impute intreg imputes continuous censored variables using the interval regression method
    • mi impute poisson imputes count variables using the Poisson method
    • mi impute nbreg imputes overdispersed count variables using the negative binomial method
  • Conditional imputation is now supported with all imputation techniques, except multivariate normal imputation (MVN), via the conditional() option.

    Conditional imputation allows you to impute variables which are defined within a particular subset of the data. Outside that subset, the variables are known to be constant. For example, the number of pregnancies is relevant only to females and is always zero for males, the smoking of high-tar cigarettes is relevant only to smokers and is always zero for nonsmokers.

    To properly impute, say, whether a person smokes high-tar cigarettes, we should condition on the smoking status during imputation. That is, we should impute missing values of subjects who smoke using only data on smokers and replace missing values of subjects who do not smoke with zeros. We would like to be able to do that even when the smoking status contains missing values and is itself being imputed. The conditional() option provides such capability:


  • Separate imputation for different groups of the data is now available via mi impute's new by() option.
  • Imputation by drawing posterior estimates from bootstrapped samples is now available with all imputation techniques, except MVN, via the new bootstrap option.
  • Perfect prediction is now handled during imputation of categorical data using logistic, ordered logistic, or multinomial logistic imputation methods when the new augment option is specified.
  • mi impute is now faster in the wide, mlong, and flong styles.

New estimation and postestimation features

  • Estimate the amount of simulation error in your final model, so you can decide whether you need more imputations using mi estimate’s new mcerror option.
  • mi estimate now supports panel-data and multilevel models. Included are xtcloglog, xtgee, xtlogit, xtmelogit, xtmepoisson, xtmixed, xtnbreg, xtpoisson, xtprobit, xtrc, and xtreg.
  • mi estimate now supports the total command.
  • Compute linear and nonlinear predictions after MI estimation using new commands mi predict and mi predictnl.

New data-management features

  • misstable summarize will now create summary variables recording the missing-values pattern via the new generate() option.

Back to highlights

See New in Stata 18 to learn about what was added in Stata 18.