Stata 15 help for mi impute

[MI] mi impute -- Impute missing values

Syntax

mi impute method ... [, impute_options ... ]

method Description ------------------------------------------------------------------------- Univariate regress linear regression for a continuous variable pmm predictive mean matching for a continuous variable truncreg truncated regression for a continuous variable with a restricted range intreg interval regression for a continuous partially observed (censored) variable logit logistic regression for a binary variable ologit ordered logistic regression for an ordinal variable mlogit multinomial logistic regression for a nominal variable poisson Poisson regression for a count variable nbreg negative binomial regression for an overdispersed count variable

Multivariate monotone sequential imputation using a monotone-missing pattern chained sequential imputation using chained equations mvn multivariate normal regression

User-defined usermethod user-defined imputation methods -------------------------------------------------------------------------

impute_options Description ------------------------------------------------------------------------- Main * add(#) specify number of imputations to add; required when no imputations exist * replace replace imputed values in existing imputations rseed(#) specify random-number seed double store imputed values in double precision; the default is to store them as float by(varlist [, byopts]) impute separately on each group formed by varlist (not allowed with usermethod)

Reporting dots display dots as imputations are performed noisily display intermediate output nolegend suppress all table legends

Advanced force proceed with imputation, even when missing imputed values are encountered

noupdate do not perform mi update (not allowed with usermethod); see [MI] noupdate option ------------------------------------------------------------------------- * add(#) is required when no imputations exist; add(#) or replace is required if imputations exist. noupdate does not appear in the dialog box. You must mi set your data before using mi impute; see [MI] mi set.

Menu

Statistics > Multiple imputation

Description

mi impute fills in missing values (.) of a single variable or of multiple variables using the specified method. The available methods (by variable type and missing-data pattern) are summarized in the tables below.

Single imputation variable (univariate imputation) ------------------------------------------------------------ Pattern Type Imputation method ------------------------------------------------------------ continuous regress, pmm, truncreg, intreg always monotone binary logit categorical ologit, mlogit count poisson, nbreg ------------------------------------------------------------

Multiple imputation variables (multivariate imputation) ------------------------------------------------------------ Pattern Type Imputation method ------------------------------------------------------------ monotone missing mixture monotone arbitrary missing mixture chained arbitrary missing continuous mvn ------------------------------------------------------------

The suggested reading order of mi impute's subentries is

[MI] mi impute regress [MI] mi impute pmm [MI] mi impute truncreg [MI] mi impute intreg [MI] mi impute logit [MI] mi impute ologit [MI] mi impute mlogit [MI] mi impute poisson [MI] mi impute nbreg

[MI] mi impute monotone [MI] mi impute chained [MI] mi impute mvn [MI] mi impute usermethod

Options

+------+ ----+ Main +-------------------------------------------------------------

add(#) specifies the number of imputations to add to the mi data. This option is required if there are no imputations in the data. If imputations exist, then add() is optional. The total number of imputations cannot exceed 1,000.

replace specifies to replace existing imputed values with new ones. One of replace or add() must be specified when mi data already have imputations.

rseed(#) sets the random-number seed. This option can be used to reproduce results. rseed(#) is equivalent to typing set seed # prior to calling mi impute; see [R] set seed.

double specifies that the imputed values be stored as doubles. By default, they are stored as floats. mi impute makes this distinction only when necessary. For example, if the logit method is used, the imputed values are stored as bytes.

by(varlist [, byopts]) specifies that imputation be performed separately for each by-group. By-groups are identified by equal values of the variables in varlist in the original data (m=0). Missing categories in varlist are omitted, unless the missing suboption is specified within by(). Imputed and passive variables may not be specified within by(). This option is not allowed with user-defined imputation methods, usermethod.

byopts are missing, noreport, nolegend, and nostop.

missing specifies that missing categories in varlist are not omitted.

noreport suppresses reporting of intermediate information about each group.

nolegend suppresses the display of group legends that appear before the imputation table when long group descriptions are encountered.

nostop specifies to proceed with imputation when imputation fails in some groups. By default, mi impute terminates with error when this happens.

+-----------+ ----+ Reporting +--------------------------------------------------------

dots specifies to display dots as imputations are successfully completed. An x is displayed if any of the specified imputation variables still have missing values.

noisily specifies that intermediate output from mi impute be displayed.

nolegend suppresses the display of all legends that appear before the imputation table.

+----------+ ----+ Advanced +---------------------------------------------------------

force specifies to proceed with imputation even when missing imputed values are encountered. By default, mi impute terminates with error if missing imputed values are encountered.

The following option is available with mi impute but is not shown in the dialog box:

noupdate in some cases suppresses the automatic mi update this command might perform; see [MI] noupdate option. This option is rarely used and is not allowed with user-defined imputation methods, usermethod.

Remarks

Using mi impute Imputation methods

Using mi impute

The data must be mi set prior to using mi impute. All variables whose missing values are to be filled in must be registered as imputed variables; see mi register. If there are no imputations, you must specify add(). If imputations already exist, you must specify either add() or replace.

If you do not have imputations, you must specify the number of imputations to add in add(). If you already have imputations, you have three choices:

1. Add new imputations to the existing ones by specifying the add() option. 2. Add new imputations and also replace the existing ones by specifying both the add() and the replace options. 3. Replace existing imputed values by specifying the replace option.

mi impute may change the type of the specified imputation variables and the sort order of the data. These changes are specific to the declared mi style.

Imputation methods

mi impute supports both univariate and multivariate imputation under the missing at random assumption (see Assumptions about missing data under Remarks and examples in [MI] intro substantive).

Univariate imputation is used to impute a single variable. It can be used repeatedly to impute multiple variables only when the variables are independent and will be used in separate analyses. To impute a single variable, you can choose from the following methods: regress, pmm, truncreg, intreg, logit, ologit, mlogit, poisson, and nbreg.

For a continuous variable, either regress or pmm can be used (for example, Rubin [1987] and Schenker and Taylor [1996]). For a continuous variable with a restricted range, a truncated variable, either pmm or truncreg (Raghunathan et al. 2001) can be used. For a continuous partially observed or censored variable, intreg can be used (Royston 2007). For a binary variable, logit can be used (Rubin 1987). For a categorical variable, ologit can be used to impute missing categories if they are ordered, and mlogit can be used to impute missing categories if they are unordered (Raghunathan et al. 2001). For a count variable, either poisson (Raghunathan et al. 2001) or nbreg (Royston 2009), in the presence of overdispersion, is often suggested. Also see van Buuren (2007) for a detailed list of univariate imputation methods.

In practice, multiple variables usually must be imputed simultaneously, and that requires using a multivariate imputation method. The choice of an imputation method in this case also depends on the pattern of missing values.

If variables follow a monotone-missing pattern (see Patterns of missing data under Remarks and examples in [MI] intro substantive), they can be imputed sequentially using univariate conditional distributions, which is implemented in the monotone method (see [MI] mi impute monotone). A separate univariate imputation model can be specified for each imputation variable, which allows simultaneous imputation of variables of different types (Rubin 1987).

When a pattern of missing values is arbitrary, iterative methods are used to fill in missing values. The mvn method (see [MI] mi impute mvn) uses multivariate normal data augmentation to impute missing values of continuous imputation variables (Schafer 1997). Allison (2001), for example, also discusses how to use this method to impute binary and categorical variables.

Another multivariate imputation method that accommodates arbitrary missing-value patterns is multiple imputation using chained equations (MICE), also known as imputation using fully conditional specifications (van Buuren, Boshuizen, and Knook 1999) and as sequential regression multivariate imputation (Raghunathan et al. 2001) in the literature. The MICE method is implemented in the chained method (see [MI] mi impute chained) and uses a Gibbs-like algorithm to impute multiple variables sequentially using univariate fully conditional specifications. Despite a lack of theoretical justification, the flexibility of MICE has made it one of the most popular choices used in practice.

For a recent comparison of MICE and multivariate normal imputation, see Lee and Carlin (2010).

Examples: Univariate imputation

Setup . webuse mheart1s0

Describe mi data . mi describe

Create 20 imputations using regression imputation, and then add 30 more . mi impute regress bmi attack smokes age female hsgrad, add(20) . mi impute regress bmi attack smokes age female hsgrad, add(30)

Use predictive mean matching and replace 50 existing imputations . mi impute pmm bmi attack smokes age female hsgrad, replace knn(5)

Examples: Multivariate imputation

Setup . webuse mheart5s0, clear

Describe mi data . mi describe

Examine missing-data patterns . mi misstable nested

Create 10 imputations using monotone imputation (monotone-missing pattern) . mi impute monotone (regress) age bmi = attack smokes hsgrad female, add(10)

Use multivariate normal imputation (arbitrary pattern) and replace existing imputations . mi impute mvn bmi = attack smokes hsgrad female, replace nolog

Impute using chained equations (arbitrary pattern) and replace existing imputations . mi impute chained (regress) age bmi = attack smokes hsgrad female, replace

Examples: Imputing on subsamples

Setup . webuse mheart1s0, clear

Impute males and females separately and create 20 imputations . mi impute regress bmi attack smokes age hsgrad, add(20) by(female)

Examples: Conditional imputation

Setup . webuse mheart7s0, clear

Describe mi data . mi describe

Examine missing-data patterns . mi misstable nested

Impute hightar using only observations for which imputation variable smokes is equal to one . mi impute monotone (regress) bmi age (logit, conditional(if smokes==1)) hightar (logit) smokes = attack hsgrad female, add(2)

Examples: User-defined imputation methods

See Examples in [MI] mi impute usermethod.

Stored results

mi impute stores the following in r():

Scalars r(M) total number of imputations r(M_add) number of added imputations r(M_update) number of updated imputations r(k_ivars) number of imputed variables r(N_g) number of imputed groups (1 if by() is not specified)

Macros r(method) name of imputation method r(ivars) names of imputation variables r(rngstate) random-number state used r(by) names of variables specified within by()

Matrices r(N) number of observations in imputation sample in each group (per variable) r(N_complete) number of complete observations in imputation sample in each group (per variable) r(N_incomplete) number of incomplete observations in imputation sample in each group (per variable) r(N_imputed) number of imputed observations in imputation sample in each group (per variable)

Also see Stored results in the method-specific entries for a list of additional stored results.

References

Allison, P. D. 2001. Missing Data. Thousand Oaks, CA: Sage.

Lee, K. J., and J. B. Carlin. 2010. Multiple imputation for missing data: Fully conditional specification versus multivariate normal imputation. American Journal of Epidemiology 171: 624-632.

Raghunathan, T. E., J. M. Lepkowski, J. Van Hoewyk, and P. Solenberger. 2001. A multivariate technique for multiply imputing missing values using a sequence of regression models. Survey Methodology 27: 85-95.

Royston, P. 2007. Multiple imputation of missing values: Further update of ice, with an emphasis on interval censoring. Stata Journal 7: 445-464.

------. 2009. Multiple imputation of missing values: Further update of ice, with an emphasis on categorical variables. Stata Journal 9: 466-477.

Rubin, D. B. 1987. Multiple Imputation for Nonresponse in Surveys. New York: Wiley.

Schafer, J. L. 1997. Analysis of Incomplete Multivariate Data. Boca Raton, FL: Chapman & Hall/CRC.

Schenker, N., and J. M. G. Taylor. 1996. Partially parametric techniques for multiple imputation. Computational Statistics & Data Analysis 22: 425-446.

van Buuren, S. 2007. Multiple imputation of discrete and continuous data by fully conditional specification. Statistical Methods in Medical Research 16: 219-242.

van Buuren, S., H. C. Boshuizen, and D. L. Knook. 1999. Multiple imputation of missing blood pressure covariates in survival analysis. Statistics in Medicine 18: 681-694.


© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index