Title | Using the cmdok option to use mi estimate with commands that are not officially supported | |

Author | Miguel Dorta, StataCorp |

The **mi estimate** prefix is
used to analyze multiply imputed data by fitting a
model to each of the imputed datasets and pooling individual results using
Rubin's combination rules (Rubin 1996). It supports a number of estimation
commands, including **regress**, **mvreg**, **probit**, and **logit**;
see [MI] **mi estimation** for
a full list. You can specify the **cmdok** option to allow **mi
estimate** to work with community-contributed commands or commands that are not
officially supported, but you must first verify that certain conditions
are met.

This FAQ has been structured as follows:

- Specifying the
**cmdok**option with**mi estimate** - Technical requirements for estimation commands to work
with
**mi estimate, cmdok** - Applicability of combination rules

In order to allow unsupported estimation commands to be prefixed by
**mi estimate**, you can specify the **cmdok** option using the
following syntax:

.mi estimate, cmdok:<estimation_command>...

Here is an example using the **ivprobit** command (probit model with continuous
endogenous regressors), which is not officially supported by **mi estimate**:

.mi estimate, cmdok: ivprobit fem_work fem_educ kids other_inc = male_educ),>twostepMultiple–imputation estimates Imputations = 20 Number of obs = 500 Average RVI = 0.0305 Largest FMI = 0.0883 DF adjustment: Large sample DF: min = 2,476.73 avg = 113,071.18 max = 205,905.89 Model F test: Equal FMI F( 3,35473.6) = 29.67 Within VCE type: Twostep Prob > F = 0.0000

Coefficient Std. err. t P>|t| [95% conf. interval] | ||

other_inc | -.0585911 .0094013 -6.23 0.000 -.0770174 -.0401648 | |

fem_educ | .2294515 .0285153 8.05 0.000 .1735619 .2853411 | |

kids | -.1843753 .0521693 -3.53 0.000 -.2866752 -.0820754 | |

_cons | .350795 .4980673 0.70 0.481 -.6254047 1.326995 | |

For **mi estimate** to apply Rubin's combination rules correctly, an unsupported estimation
command must fulfill the following requirements:

- Store the command's name in the global macro
**e(cmd)**. - Store the estimated coefficients in the
**e(b)**matrix. - Store the full variance–covariance matrix estimate in the
**e(V)**matrix. - Store the residual degrees of freedom in the
**e(df_r)**scalar; if this is not applicable for the particular estimator, an**e(df_r)**scalar should not be returned at all.

For example, the **pca** command for
principal component analysis is not currently supported by **mi estimate**.
Let us try to prefix it with **mi estimate, cmdok**.

.webuse mhouses1993s30,clear(Albuquerque Home Prices Feb15-Apr30, 1993) .mi estimate, cmdok: pca price tax sqft, comp(2)matrix e(b) is not set matrix e(V) is not set r(301);

As we can see, the **cmdok** option did not work because the **pca**
command does not store the **e(b)** and **e(V)** matrices, which
means that requirements 2 and 3 were violated.

After an estimation command is executed, the **ereturn list** command can be
used to see whether the required **e()** results above are produced. Also, the
**matrix list** command is useful to show more details of the **e(b)** and
**e(V)** matrices if they are posted.

On the other hand, if the **vce(normal)** option (assuming that the eigenvalues
and eigenvectors are multivariate normal) is specified with the **pca** command,
all eigenvalues and eigenvectors are stored in **e(b)** as a coefficient vector;
the corresponding covariance matrix is stored in **e(V)**. Let us see what
happens if we now prefix **pca, vce(normal)** with **mi estimate, cmdok**.

.mi estimate, cmdok: pca price tax sqft, vce(normal) comp(2)Multiple-imputation estimates Imputations = 30 Principal components Number of obs = 117 Average RVI = 0.0092 Largest FMI = 0.0479 DF adjustment: Large sample DF: min = 12,732.29 avg = 3.47e+08 Within VCE type: MULTIVARIATE NORMALITY max = 2.62e+09

Coefficient Std. err. t P>|t| [95% conf. interval] | ||

Eigenvalues | ||

Comp1 | 2.718188 .3553968 7.65 0.000 2.021624 3.414753 | |

Comp2 | .1584574 .019509 8.12 0.000 .1202203 .1966946 | |

Comp1 | ||

price | .5776828 .0180433 32.02 0.000 .5423186 .613047 | |

tax | .5805305 .0170499 34.05 0.000 .5471133 .6139477 | |

sqft | .5738171 .0192929 29.74 0.000 .5360037 .6116305 | |

Comp2 | ||

price | -.5528039 .2298562 -2.40 0.016 -1.003357 -.1022512 | |

tax | -.2359819 .3015663 -0.78 0.434 -.8270898 .3551259 | |

sqft | .7952124 .0797496 9.97 0.000 .6388974 .9515274 | |

The **cmdok** option worked properly because the four requirements above were
satisfied.

This example has just been used for illustration. For most estimation
commands, researchers are usually interested in estimates that are returned on
**e(b)**, and **mi estimate, cmdok** will then compute what they need. Also, notice
that the output from **mi estimate** will not present all the results of the
**pca** output, because values not stored in **e(b)** (such as variance proportion)
will be ignored in the process. Postestimation results or graphs that rely on
values other than those in **e(b)** will not be available for this specific
example. Because of issues and other considerations, some estimation
commands may not be officially supported.

If the four requirements above are met, **mi estimate, cmdok** will correctly
apply the Rubin's combination rules to multiply imputed data. However, **mi
estimate** cannot determine whether the specific estimator has the required
properties to ensure statistical validity of the final MI results. A user is
responsible for checking whether the combination rules are applicable to the
estimator of interest. In general, combination rules are applicable to
estimators that are asymptotically normal with the corresponding
variance–covariance matrix being consistently estimated. Also, combination rules
should be applied to the estimators in the metric for which their sampling
distributions are closest to the normal distribution. For more information
about the statistical requirements of statistical validity of MI results, see
Rubin (1996).

In the earlier PCA example, inference on the eigenvalues and eigenvectors mainly relies on the assumption that the variables are multivariate normally distributed. In this case, the eigenvalues and eigenvectors can be estimated using maximum likelihood with the estimates being asymptotically (multivariate) normally distributed (Anderson 1963; Jackson 2003). If the analyzed variables are not multivariate normally distributed, the MI results above would not be statistically valid.

Anderson, T. W. 1963. Asymptotic theory for principal component analysis.
*Annals of Mathematical Statistics* 34: 122–148.

Jackson, J. E. 2003. *A User’s Guide to Principal Components.* New York: Wiley

Rubin, D. B. 1996. Multiple imputation after 18+ years. *Journal of the
American Statistical Association* 91: 473–489.