**[MI] mi impute chained** -- Impute missing values using chained equations

__Syntax__

Default specification of prediction equations, basic syntax

**mi** __imp__**ute** __chain__**ed** **(***uvmethod***)** *ivars* [**=** *indepvars*] [*if*] [*weight*] [**,**
*impute_options* *options*]

Default specification of prediction equations, full syntax

**mi** __imp__**ute** __chain__**ed** *lhs* [**=** *indepvars*] [*if*] [*weight*] [**,** *impute_options*
*options*]

Custom specification of prediction equations

**mi** __imp__**ute** __chain__**ed** *lhsc* [**=** *indepvars*] [*if*] [*weight*] [**,** *impute_options*
*options*]

where *lhs* is *lhs_spec* [*lhs_spec* [...]] and *lhs_spec* is

**(***uvmethod* [*if*] [**,** *uvspec_options*]**)** *ivars*

*lhsc* is *lhsc_spec* [*lhsc_spec* [...]] and *lhsc_spec* is

**(***uvmethod* [*if*] [**,** __incl__**ude(***xspec***)** **omit(***varlist***)** __noimp__**uted**
*uvspec_options*]**)** *ivars*

*ivars* (or *newivar* if *uvmethod* is **intreg**) are the names of the imputation
variables.

*uvspec_options* are __asc__**ontinuous**, __noi__**sily**, and the method-specific *options*
as described in the manual entry for each univariate imputation
method.

The **include()**, **omit()**, and **noimputed** options allow you to customize the
default prediction equations.

*uvmethod* Description
-------------------------------------------------------------------------
__reg__**ress** linear regression for a continuous variable; **[MI]**
**mi impute regress**
**pmm** predictive mean matching for a continuous variable;
**[MI] mi impute pmm**
**truncreg** truncated regression for a continuous variable with
a restricted range; **[MI] mi impute truncreg**
**intreg** interval regression for a continuous partially
observed (censored) variable; **[MI] mi impute**
**intreg**
__logi__**t** logistic regression for a binary variable; **[MI] mi**
**impute logit**
__olog__**it** ordered logistic regression for an ordinal
variable; **[MI] mi impute ologit**
__mlog__**it** multinomial logistic regression for a nominal
variable; **[MI] mi impute mlogit**
**poisson** Poisson regression for a count variable; **[MI] mi**
**impute poisson**
**nbreg** negative binomial regression for an overdispersed
count variable; **[MI] mi impute nbreg**
-------------------------------------------------------------------------

*options* Description
-------------------------------------------------------------------------
MICE options
__burn__**in(***#***)** specify number of iterations for the burn-in
period; default is **burnin(10)**
**chainonly** perform chained iterations for the length of the
burn-in period without creating imputations in
the data
__aug__**ment** perform augmented regression in the presence of
perfect prediction for all categorical imputation
variables
__noimp__**uted** do not include imputation variables in any
prediction equation
__boot__**strap** estimate model parameters using sampling with
replacement
**savetrace(...)** save summaries of imputed values from each
iteration in *filename***.dta**

Reporting
**dryrun** show conditional specifications without imputing
data
**report** show report about each conditional specification
**chaindots** display dots as chained iterations are performed
__showe__**very(***#***)** display intermediate results from every *#*th
iteration
__showi__**ter(***numlist***)** display intermediate results from every iteration
in *numlist*

Advanced
**orderasis** impute variables in the specified order
**nomonotone** impute using chained equations even when variables
follow a monotone-missing pattern; default is to
use monotone method
**nomonotonechk** do not check whether variables follow a
monotone-missing pattern
-------------------------------------------------------------------------
You must **mi** **set** your data before using **mi** **impute** **chained**; see **[MI] mi**
**set**.
You must **mi** **register** *ivars* as imputed before using **mi** **impute** **chained**; see
**[MI] mi set**.
*indepvars* may contain factor variables; see fvvarlist.
**fweight**s, **aweight**s (**regress**, **pmm**, **truncreg**, and **intreg** only), **iweight**s,
and **pweight**s are allowed; see weight.

__Menu__

**Statistics > Multiple imputation**

__Description__

**mi** **impute** **chained** fills in missing values in multiple variables
iteratively by using chained equations, a sequence of univariate
imputation methods with fully conditional specification (FCS) of
prediction equations. It accommodates arbitrary missing-value patterns.
You can perform separate imputations on different subsets of the data by
specifying the **by()** option. You can also account for frequency, analytic
(with continuous variables only), importance, and sampling weights.

__Options__

+------+
----+ Main +-------------------------------------------------------------

**add()**, **replace**, **rseed()**, **double**, **by()**; see **[MI] mi impute**.

The following options appear on a Specification dialog that appears when
you click on the **Create ...** button on the **Main** tab. The **include()**,
**omit()**, and **noimputed** options allow you to customize the default
prediction equations.

**include(***xspec***)** specifies that *xspec* be included in prediction equations
of all imputation variables corresponding to the current
left-hand-side specification *lhsc_spec*. *xspec* includes complete
variables and expressions of imputation variables bound in
parentheses. If the **noimputed** option is specified within *lhsc_spec*
or with **mi impute chained**, then *xspec* may also include imputation
variables. *xspec* may contain factor variables; see fvvarlist.

**omit(***varlist***)** specifies that *varlist* be omitted from the prediction
equations of all imputation variables corresponding to the current
left-hand-side specification *lhsc_spec*. *varlist* may include complete
variables or imputation variables. *varlist* may contain factor
variables; see fvvarlist. In **omit()**, you should list variables to be
omitted exactly as they appear in the prediction equation
(abbreviations are allowed). For example, if variable **x1** is listed
as a factor variable, use **omit(i.x1)** to omit it from the prediction
equation.

**noimputed** specifies that no imputation variables automatically be
included in prediction equations of imputation variables
corresponding to the current *uvmethod*.

*uvspec_options* are options specified within each univariate imputation
method, *uvmethod*. *uvspec_options* include __asc__**ontinuous**, __noi__**sily**, and
the method-specific *options* as described in the manual entry for each
univariate imputation method.

**ascontinuous** specifies that categorical imputation variables
corresponding to the current *uvmethod* be included as continuous
in all prediction equations. This option is only allowed when
*uvmethod* is **logit**, **ologit**, or **mlogit**.

**noisily** specifies that the output from the current univariate model
fit to the observed data be displayed. This option is useful in
combination with the **showevery(***#***)** or **showiter(***numlist***)** option to
display results from a particular univariate imputation model for
specific iterations.

+--------------+
----+ MICE options +-----------------------------------------------------

**burnin(***#***)** specifies the number of iterations for the burn-in period for
each chain (one chain per imputation). The default is **burnin(10)**.
This option specifies the number of iterations necessary for a chain
to reach approximate stationarity or, equivalently, to converge to a
stationary distribution. The required length of the burn-in period
will depend on the starting values used and the missing-data patterns
observed in the data. It is important to examine the chain for
convergence to determine an adequate length of the burn-in period
prior to obtaining imputations; see *Convergence of MICE* under *Remarks*
*and examples* in **[MI] mi impute chained**. The provided default is what
current literature recommends. However, you are responsible for
determining that sufficient iterations are performed.

**chainonly** specifies that **mi impute chained** perform chained iterations for
the length of the burn-in period and then stop. This option is
useful in combination with **savetrace()** to examine the convergence of
the method prior to imputation. No imputations are created when
**chainonly** is specified, so **add()** or **replace** is not required with
**mi impute chained, chainonly** and they are ignored if specified.

**augment** specifies that augmented regression be performed if perfect
prediction is detected. By default, an error is issued when perfect
prediction is detected. The idea behind the augmented-regression
approach is to add a few observations with small weights to the data
during estimation to avoid perfect prediction. See *The issue of*
*perfect prediction during imputation of categorical data* under
*Remarks and examples* in **[MI] mi impute** for more information. **augment**
is not allowed with importance weights. This option is equivalent to
specifying **augment** within univariate specifications of all
categorical imputation methods: **logit**, **ologit**, and **mlogit**.

**noimputed** specifies that no imputation variables automatically be
included in any of the prediction equations. This option is seldom
used. This option is convenient if you wish to use different sets of
imputation variables in all prediction equations. It is equivalent
to specifying **noimputed** within all univariate specifications.

**bootstrap** specifies that posterior estimates of model parameters be
obtained using sampling with replacement; that is, posterior
estimates are estimated from a bootstrap sample. The default is to
sample the estimates from the posterior distribution of model
parameters or from the large-sample normal approximation of the
posterior distribution. This option is useful when asymptotic
normality of parameter estimates is suspect. This option is
equivalent to specifying **bootstrap** within all univariate
specifications.

**savetrace(***filename*[**,** *traceopts*]**)** specifies to save the means and standard
deviations of imputed values from each iteration to a Stata dataset
called *filename***.dta**. If the file already exists, the **replace**
suboption specifies to overwrite the existing file. **savetrace()** is
useful for monitoring convergence of the chained algorithm. This
option cannot be combined with **by()**.

*traceopts* are **replace**, **double**, and **detail**.

**replace** indicates that *filename***.dta** be overwritten if it exists.

**double** specifies that the variables be stored as **double**s, meaning
8-byte reals. By default, they are stored as **float**s, meaning
4-byte reals. See **[D] data types**.

**detail** specifies that additional summaries of imputed values
including the smallest and the largest values and the 25th,
50th, and 75th percentiles are saved in *filename***.dta**.

+-----------+
----+ Reporting +--------------------------------------------------------

**dots**, **noisily**, **nolegend**; see **[MI] mi impute**. **noisily** specifies that the
output from all univariate conditional models fit to the observed
data be displayed. **nolegend** suppresses all imputation table legends
that include a legend with the titles of the univariate imputation
methods used, a legend about conditional imputation when
**conditional()** is used within univariate specifications, and group
legends when **by()** is specified.

**dryrun** specifies to show the conditional specifications that would be
used to impute each variable without actually imputing data. This
option is recommended for checking specifications of conditional
models prior to imputation.

**report** specifies to show a report about each univariate conditional
specification. This option, in a combination with **dryrun**, is
recommended for checking specifications of conditional models prior
to imputation.

**chaindots** specifies that all chained iterations be displayed as dots. An
**x** is displayed for every failed iteration.

**showevery(***#***)** specifies that intermediate regression output be displayed
from every *#*th iteration. This option requires **noisily**. If **noisily**
is specified with **mi impute chained**, then the output from the
specified iterations is displayed for all univariate conditional
models. If **noisily** is used within a univariate specification, then
the output from the corresponding univariate model from the specified
iterations is displayed.

**showiter(***numlist***)** specifies that intermediate regression output be
displayed for each iteration in *numlist*. This option requires
**noisily**. If **noisily** is specified with **mi impute chained**, then the
output from the specified iterations is displayed for all univariate
conditional models. If **noisily** is used within a univariate
specification, then the output from the corresponding univariate
model from the specified iterations is displayed.

+----------+
----+ Advanced +---------------------------------------------------------

**force**; see **[MI] mi impute**.

**orderasis** requests that the variables be imputed in the specified order.
By default, variables are imputed in order from the most observed to
the least observed.

**nomonotone**, a rarely used option, specifies not to use monotone
imputation and to proceed with chained iterations even when
imputation variables follow a monotone-missing pattern. **mi impute**
**chained** checks whether imputation variables have a monotone
missing-data pattern and, if they do, imputes them using the monotone
method (without iteration). If **nomonotone** is used, **mi impute chained**
imputes variables iteratively even if variables are monotone-missing.

**nomonotonechk** specifies not to check whether imputation variables follow
a monotone-missing pattern. By default, **mi impute chained** checks
whether imputation variables have a monotone missing-data pattern
and, if they do, imputes them using the monotone method (without
iteration). If **nomonotonechk** is used, **mi impute chained** does not
check the missing-data pattern and imputes variables iteratively even
if variables are monotone-missing. Once imputation variables are
established to have an arbitrary missing-data pattern, this option
may be used to avoid potentially time-consuming checks; the
monotonicity check may be time consuming when a large number of
variables is being imputed.

The following option is available with **mi impute** but is not shown in the
dialog box:

**noupdate**; see **[MI] noupdate option**.

__Examples: Default prediction equations__

Setup
**. webuse mheart8s0**

Describe **mi** data
**. mi describe**

Examine missing-data patterns
**. mi misstable pattern**

Impute **bmi** and **age** using linear regression
**. mi impute chained (regress) bmi age = attack smokes hsgrad female,**
**add(10)**

Impute **bmi** using predictive mean matching and **age** using linear regression
**. mi impute chained (pmm, knn(5)) bmi (regress) age = attack smokes**
**hsgrad female, replace**

__Examples: Custom prediction equations__

Setup
**. webuse mheart8s0, clear**

Impute **bmi** using predictive mean matching and **age** using linear
regression; omit **hsgrad** from the prediction equation for **bmi**
**. mi impute chained ///**
**(pmm, knn(5) omit(hsgrad)) bmi ///**
**(regress) age = attack smokes hsgrad female, add(10)**

In the above, impute **age** using predictive mean matching and include age
squared to the prediction equation for **bmi**
**. mi impute chained ///**
**(pmm, knn(5) omit(hsgrad) include((age^2))) bmi ///**
**(pmm, knn(5)) age = attack smokes hsgrad female, replace**

__Examples: Imputing on subsamples__

In the previous example, impute **bmi** and **age** separately for males and
females; display dots as imputations are performed
**. mi impute chained ///**
**(pmm, knn(5) omit(hsgrad) include((age^2))) bmi ///**
**(pmm, knn(5)) age = attack smokes hsgrad, replace by(female) dots**

__Examples: Conditional imputation__

Setup
**. webuse mheart10s0, clear**

Describe **mi** data
**. mi describe**

Impute **bmi** and **age** using predictive mean matching, and **smokes** and **hightar**
using logistic regression; impute **hightar** using only observations for
which **smokes==1**
**. mi impute chained ///**
**(pmm, knn(5)) bmi ///**
**(pmm, knn(5)) age ///**
**(logit, cond(if smokes==1) omit(i.smokes)) hightar ///**
**(logit) smokes = attack hsgrad female, add(10)**

__Stored results__

**mi impute chained** stores the following in **r()**:

Scalars
**r(M)** total number of imputations
**r(M_add)** number of added imputations
**r(M_update)** number of updated imputations
**r(k_ivars)** number of imputed variables
**r(burnin)** number of burn-in iterations
**r(N_g)** number of imputed groups (**1** if **by()** is not
specified)

Macros
**r(method)** name of imputation method (**chained**)
**r(ivars)** names of imputation variables
**r(uvmethods)** names of univariate imputation methods
**r(init)** type of initialization
**r(rngstate)** random-number state used
**r(by)** names of variables specified within **by()**

Matrices
**r(N)** number of observations in imputation sample in
each group (per variable)
**r(N_complete)** number of complete observations in imputation
sample in each group (per variable)
**r(N_incomplete)** number of incomplete observations in
imputation sample in each group (per
variable)
**r(N_imputed)** number of imputed observations in imputation
sample in each group (per variable)