**[MI] mi impute** -- Impute missing values

__Syntax__

**mi** __imp__**ute** *method* ... [**,** *impute_options* ... ]

*method* Description
-------------------------------------------------------------------------
Univariate
__reg__**ress** linear regression for a continuous variable
**pmm** predictive mean matching for a continuous
variable
**truncreg** truncated regression for a continuous variable
with a restricted range
**intreg** interval regression for a continuous partially
observed (censored) variable
__logi__**t** logistic regression for a binary variable
__olog__**it** ordered logistic regression for an ordinal
variable
__mlog__**it** multinomial logistic regression for a nominal
variable
**poisson** Poisson regression for a count variable
**nbreg** negative binomial regression for an
overdispersed count variable

Multivariate
__mon__**otone** sequential imputation using a monotone-missing
pattern
__chain__**ed** sequential imputation using chained equations
**mvn** multivariate normal regression

User-defined
*usermethod* user-defined imputation methods
-------------------------------------------------------------------------

*impute_options* Description
-------------------------------------------------------------------------
Main
* **add(***#***)** specify number of imputations to add; required
when no imputations exist
* **replace** replace imputed values in existing imputations
**rseed(***#***)** specify random-number seed
**double** store imputed values in double precision; the
default is to store them as **float**
**by(***varlist* [**,** *byopts*]**)** impute separately on each group formed by
*varlist* (not allowed with *usermethod*)

Reporting
**dots** display dots as imputations are performed
__noi__**sily** display intermediate output
__noleg__**end** suppress all table legends

Advanced
**force** proceed with imputation, even when missing
imputed values are encountered

__noup__**date** do not perform **mi update** (not allowed with
*usermethod*); see **[MI] noupdate option**
-------------------------------------------------------------------------
* **add(***#***)** is required when no imputations exist; **add(***#***)** or **replace** is
required if imputations exist.
**noupdate** does not appear in the dialog box.
You must **mi set** your data before using **mi** **impute**; see **[MI] mi set**.

__Menu__

**Statistics > Multiple imputation**

__Description__

**mi** **impute** fills in missing values (**.**) of a single variable or of multiple
variables using the specified method. The available methods (by variable
type and missing-data pattern) are summarized in the tables below.

Single imputation variable (univariate imputation)
------------------------------------------------------------
Pattern Type Imputation method
------------------------------------------------------------
continuous **regress**, **pmm**,
**truncreg**, **intreg**
always monotone binary **logit**
categorical **ologit**, **mlogit**
count **poisson**, **nbreg**
------------------------------------------------------------

Multiple imputation variables (multivariate imputation)
------------------------------------------------------------
Pattern Type Imputation method
------------------------------------------------------------
monotone missing mixture **monotone**
arbitrary missing mixture **chained**
arbitrary missing continuous **mvn**
------------------------------------------------------------

The suggested reading order of **mi** **impute**'s subentries is

**[MI] mi impute regress**
**[MI] mi impute pmm**
**[MI] mi impute truncreg**
**[MI] mi impute intreg**
**[MI] mi impute logit**
**[MI] mi impute ologit**
**[MI] mi impute mlogit**
**[MI] mi impute poisson**
**[MI] mi impute nbreg**

**[MI] mi impute monotone**
**[MI] mi impute chained**
**[MI] mi impute mvn**
**[MI]** *mi impute usermethod*

__Options__

+------+
----+ Main +-------------------------------------------------------------

**add(***#***)** specifies the number of imputations to add to the **mi** data. This
option is required if there are no imputations in the data. If
imputations exist, then **add()** is optional. The total number of
imputations cannot exceed 1,000.

**replace** specifies to replace existing imputed values with new ones. One
of **replace** or **add()** must be specified when **mi** data already have
imputations.

**rseed(***#***)** sets the random-number seed. This option can be used to
reproduce results. **rseed(***#***)** is equivalent to typing **set** **seed** *#* prior
to calling **mi** **impute**; see **[R] set seed**.

**double** specifies that the imputed values be stored as **double**s. By
default, they are stored as **float**s. **mi** **impute** makes this distinction
only when necessary. For example, if the **logit** method is used, the
imputed values are stored as **byte**s.

**by(***varlist* [**,** *byopts*]**)** specifies that imputation be performed separately
for each by-group. By-groups are identified by equal values of the
variables in *varlist* in the original data (*m*=0). Missing categories
in *varlist* are omitted, unless the **missing** suboption is specified
within **by()**. Imputed and passive variables may not be specified
within **by()**. This option is not allowed with user-defined imputation
methods, *usermethod*.

*byopts* are __mis__**sing**, **noreport**, __noleg__**end**, and **nostop**.

**missing** specifies that missing categories in *varlist* are not
omitted.

**noreport** suppresses reporting of intermediate information about
each group.

**nolegend** suppresses the display of group legends that appear
before the imputation table when long group descriptions are
encountered.

**nostop** specifies to proceed with imputation when imputation fails
in some groups. By default, **mi impute** terminates with error
when this happens.

+-----------+
----+ Reporting +--------------------------------------------------------

**dots** specifies to display dots as imputations are successfully completed.
An **x** is displayed if any of the specified imputation variables still
have missing values.

**noisily** specifies that intermediate output from **mi impute** be displayed.

**nolegend** suppresses the display of all legends that appear before the
imputation table.

+----------+
----+ Advanced +---------------------------------------------------------

**force** specifies to proceed with imputation even when missing imputed
values are encountered. By default, **mi impute** terminates with error
if missing imputed values are encountered.

The following option is available with **mi impute** but is not shown in the
dialog box:

**noupdate** in some cases suppresses the automatic **mi update** this command
might perform; see **[MI] noupdate option**. This option is rarely used
and is not allowed with user-defined imputation methods, *usermethod*.

__Remarks__

Using mi impute
Imputation methods

__Using mi impute__

The data must be **mi set** prior to using **mi** **impute**. All variables whose
missing values are to be filled in must be registered as imputed
variables; see **mi register**. If there are no imputations, you must
specify **add()**. If imputations already exist, you must specify either
**add()** or **replace**.

If you do not have imputations, you must specify the number of
imputations to add in **add()**. If you already have imputations, you have
three choices:

1. Add new imputations to the existing ones by specifying the **add()**
option.
2. Add new imputations and also replace the existing ones by specifying
both the **add()** and the **replace** options.
3. Replace existing imputed values by specifying the **replace** option.

**mi** **impute** may change the type of the specified imputation variables and
the sort order of the data. These changes are specific to the declared
**mi** style.

__Imputation methods__

**mi impute** supports both univariate and multivariate imputation under the
missing at random assumption (see *Assumptions about missing data* under
*Remarks and examples* in **[MI] intro substantive**).

Univariate imputation is used to impute a single variable. It can be
used repeatedly to impute multiple variables only when the variables are
independent and will be used in separate analyses. To impute a single
variable, you can choose from the following methods: **regress**, **pmm**,
**truncreg**, **intreg**, **logit**, **ologit**, **mlogit**, **poisson**, and **nbreg**.

For a continuous variable, either **regress** or **pmm** can be used (for
example, Rubin [1987] and Schenker and Taylor [1996]). For a continuous
variable with a restricted range, a truncated variable, either **pmm** or
**truncreg** (Raghunathan et al. 2001) can be used. For a continuous
partially observed or censored variable, **intreg** can be used (Royston
2007). For a binary variable, **logit** can be used (Rubin 1987). For a
categorical variable, **ologit** can be used to impute missing categories if
they are ordered, and **mlogit** can be used to impute missing categories if
they are unordered (Raghunathan et al. 2001). For a count variable,
either **poisson** (Raghunathan et al. 2001) or **nbreg** (Royston 2009), in the
presence of overdispersion, is often suggested. Also see van Buuren
(2007) for a detailed list of univariate imputation methods.

In practice, multiple variables usually must be imputed simultaneously,
and that requires using a multivariate imputation method. The choice of
an imputation method in this case also depends on the pattern of missing
values.

If variables follow a monotone-missing pattern (see *Patterns of missing*
*data* under *Remarks and examples* in **[MI] intro substantive**), they can be
imputed sequentially using univariate conditional distributions, which is
implemented in the **monotone** method (see **[MI] mi impute monotone**). A
separate univariate imputation model can be specified for each imputation
variable, which allows simultaneous imputation of variables of different
types (Rubin 1987).

When a pattern of missing values is arbitrary, iterative methods are used
to fill in missing values. The **mvn** method (see **[MI] mi impute mvn**) uses
multivariate normal data augmentation to impute missing values of
continuous imputation variables (Schafer 1997). Allison (2001), for
example, also discusses how to use this method to impute binary and
categorical variables.

Another multivariate imputation method that accommodates arbitrary
missing-value patterns is multiple imputation using chained equations
(MICE), also known as imputation using fully conditional specifications
(van Buuren, Boshuizen, and Knook 1999) and as sequential regression
multivariate imputation (Raghunathan et al. 2001) in the literature. The
MICE method is implemented in the **chained** method (see **[MI] mi impute**
**chained**) and uses a Gibbs-like algorithm to impute multiple variables
sequentially using univariate fully conditional specifications. Despite
a lack of theoretical justification, the flexibility of MICE has made it
one of the most popular choices used in practice.

For a recent comparison of MICE and multivariate normal imputation, see
Lee and Carlin (2010).

__Examples: Univariate imputation__

Setup
**. webuse mheart1s0**

Describe **mi** data
**. mi describe**

Create 20 imputations using regression imputation, and then add 30 more
**. mi impute regress bmi attack smokes age female hsgrad, add(20)**
**. mi impute regress bmi attack smokes age female hsgrad, add(30)**

Use predictive mean matching and replace 50 existing imputations
**. mi impute pmm bmi attack smokes age female hsgrad, replace knn(5)**

__Examples: Multivariate imputation__

Setup
**. webuse mheart5s0, clear**

Describe **mi** data
**. mi describe**

Examine missing-data patterns
**. mi misstable nested**

Create 10 imputations using monotone imputation (monotone-missing
pattern)
**. mi impute monotone (regress) age bmi = attack smokes hsgrad female,**
**add(10)**

Use multivariate normal imputation (arbitrary pattern) and replace
existing imputations
**. mi impute mvn bmi = attack smokes hsgrad female, replace nolog**

Impute using chained equations (arbitrary pattern) and replace existing
imputations
**. mi impute chained (regress) age bmi = attack smokes hsgrad female,**
**replace**

__Examples: Imputing on subsamples__

Setup
**. webuse mheart1s0, clear**

Impute males and females separately and create 20 imputations
**. mi impute regress bmi attack smokes age hsgrad, add(20) by(female)**

__Examples: Conditional imputation__

Setup
**. webuse mheart7s0, clear**

Describe **mi** data
**. mi describe**

Examine missing-data patterns
**. mi misstable nested**

Impute **hightar** using only observations for which imputation variable
**smokes** is equal to one
**. mi impute monotone**
**(regress) bmi age**
**(logit, conditional(if smokes==1)) hightar**
**(logit) smokes = attack hsgrad female, add(2)**

__Examples: User-defined imputation methods__

See *Examples* in **[MI]** *mi impute usermethod*.

__Stored results__

**mi impute** stores the following in **r()**:

Scalars
**r(M)** total number of imputations
**r(M_add)** number of added imputations
**r(M_update)** number of updated imputations
**r(k_ivars)** number of imputed variables
**r(N_g)** number of imputed groups (**1** if **by()** is not
specified)

Macros
**r(method)** name of imputation method
**r(ivars)** names of imputation variables
**r(rngstate)** random-number state used
**r(by)** names of variables specified within **by()**

Matrices
**r(N)** number of observations in imputation sample in
each group (per variable)
**r(N_complete)** number of complete observations in imputation
sample in each group (per variable)
**r(N_incomplete)** number of incomplete observations in
imputation sample in each group (per
variable)
**r(N_imputed)** number of imputed observations in imputation
sample in each group (per variable)

Also see *Stored results* in the method-specific entries for a list of
additional stored results.

__References__

Allison, P. D. 2001. *Missing Data*. Thousand Oaks, CA: Sage.

Lee, K. J., and J. B. Carlin. 2010. Multiple imputation for missing data:
Fully conditional specification versus multivariate normal
imputation. *American Journal of Epidemiology* 171: 624-632.

Raghunathan, T. E., J. M. Lepkowski, J. Van Hoewyk, and P. Solenberger.
2001. A multivariate technique for multiply imputing missing values
using a sequence of regression models. *Survey Methodology* 27: 85-95.

Royston, P. 2007. Multiple imputation of missing values: Further update
of ice, with an emphasis on interval censoring. *Stata Journal* 7:
445-464.

------. 2009. Multiple imputation of missing values: Further update of
ice, with an emphasis on categorical variables. *Stata Journal* 9:
466-477.

Rubin, D. B. 1987. *Multiple Imputation for Nonresponse in Surveys*. New
York: Wiley.

Schafer, J. L. 1997. *Analysis of Incomplete Multivariate Data*. Boca
Raton, FL: Chapman & Hall/CRC.

Schenker, N., and J. M. G. Taylor. 1996. Partially parametric techniques
for multiple imputation. *Computational Statistics & Data Analysis* 22:
425-446.

van Buuren, S. 2007. Multiple imputation of discrete and continuous data
by fully conditional specification. *Statistical Methods in Medical*
*Research* 16: 219-242.

van Buuren, S., H. C. Boshuizen, and D. L. Knook. 1999. Multiple
imputation of missing blood pressure covariates in survival analysis.
*Statistics in Medicine* 18: 681-694.