Home  /  Resources & support  /  FAQs  /  Using the cmdok option

Can mi estimate work with community-contributed and unsupported commands?

Title   Using the cmdok option to use mi estimate with commands that are not officially supported
Author Miguel Dorta, StataCorp

The mi estimate prefix is used to analyze multiply imputed data by fitting a model to each of the imputed datasets and pooling individual results using Rubin's combination rules (Rubin 1996). It supports a number of estimation commands, including regress, mvreg, probit, and logit; see [MI] mi estimation for a full list. You can specify the cmdok option to allow mi estimate to work with community-contributed commands or commands that are not officially supported, but you must first verify that certain conditions are met.

This FAQ has been structured as follows:

Specifying the cmdok option with mi estimate

In order to allow unsupported estimation commands to be prefixed by mi estimate, you can specify the cmdok option using the following syntax:

    . mi estimate, cmdok: <estimation_command> ... 

Here is an example using the ivprobit command (probit model with continuous endogenous regressors), which is not officially supported by mi estimate:

. use http://www.stata.com/support/faqs/dta/laborsup_imputed, clear

. mi estimate, cmdok: ivprobit fem_work fem_educ kids
>    (other_inc = male_educ), twostep

Multiple-imputation estimates                   Imputations       =         20
                                                Number of obs     =        500
                                                Average RVI       =     0.0305
                                                Largest FMI       =     0.0883
DF adjustment:   Large sample                   DF:     min       =   2,476.73
                                                        avg       = 113,071.18
                                                        max       = 205,905.89
Model F test:       Equal FMI                   F(   3,35473.6)   =      29.67
Within VCE type:      Twostep                   Prob > F          =     0.0000

Coefficient Std. err. t P>|t| [95% conf. interval]
other_inc -.0585911 .0094013 -6.23 0.000 -.0770174 -.0401648
fem_educ .2294515 .0285153 8.05 0.000 .1735619 .2853411
kids -.1843753 .0521693 -3.53 0.000 -.2866752 -.0820754
_cons .350795 .4980673 0.70 0.481 -.6254047 1.326995

Technical requirements for estimation commands to work with mi estimate, cmdok

For mi estimate to apply Rubin's combination rules correctly, an unsupported estimation command must fulfill the following requirements:

  1. Store the command's name in the global macro e(cmd).
  2. Store the estimated coefficients in the e(b) matrix.
  3. Store the full variance–covariance matrix estimate in the e(V) matrix.
  4. Store the residual degrees of freedom in the e(df_r) scalar; if this is not applicable for the particular estimator, an e(df_r) scalar should not be returned at all.

For example, the pca command for principal component analysis is not currently supported by mi estimate. Let us try to prefix it with mi estimate, cmdok.

. webuse mhouses1993s30,clear
(Albuquerque Home Prices Feb15-Apr30, 1993)

. mi estimate, cmdok: pca price tax sqft, comp(2)
matrix e(b) is not set
matrix e(V) is not set
r(301);

As we can see, the cmdok option did not work because the pca command does not store the e(b) and e(V) matrices, which means that requirements 2 and 3 were violated.

After an estimation command is executed, the ereturn list command can be used to see whether the required e() results above are produced. Also, the matrix list command is useful to show more details of the e(b) and e(V) matrices if they are posted.

On the other hand, if the vce(normal) option (assuming that the eigenvalues and eigenvectors are multivariate normal) is specified with the pca command, all eigenvalues and eigenvectors are stored in e(b) as a coefficient vector; the corresponding covariance matrix is stored in e(V). Let us see what happens if we now prefix pca, vce(normal) with mi estimate, cmdok.

. mi estimate, cmdok: pca price tax sqft, vce(normal) comp(2)

Multiple-imputation estimates                   Imputations       =         30
Principal components                            Number of obs     =        117
                                                Average RVI       =     0.0092
                                                Largest FMI       =     0.0479
DF adjustment:   Large sample                   DF:     min       =  12,732.29
                                                        avg       =   3.47e+08
Within VCE type: MULTIVARIATE NORMALITY                 max       =   2.62e+09

Coefficient Std. err. t P>|t| [95% conf. interval]
Eigenvalues
Comp1 2.718188 .3553968 7.65 0.000 2.021624 3.414753
Comp2 .1584574 .019509 8.12 0.000 .1202203 .1966946
Comp1
price .5776828 .0180433 32.02 0.000 .5423186 .613047
tax .5805305 .0170499 34.05 0.000 .5471133 .6139477
sqft .5738171 .0192929 29.74 0.000 .5360037 .6116305
Comp2
price -.5528039 .2298562 -2.40 0.016 -1.003357 -.1022512
tax -.2359819 .3015663 -0.78 0.434 -.8270898 .3551259
sqft .7952124 .0797496 9.97 0.000 .6388974 .9515274

The cmdok option worked properly because the four requirements above were satisfied.

This example has just been used for illustration. For most estimation commands, researchers are usually interested in estimates that are returned on e(b), and mi estimate, cmdok will then compute what they need. Also, notice that the output from mi estimate will not present all the results of the pca output, because values not stored in e(b) (such as variance proportion) will be ignored in the process. Postestimation results or graphs that rely on values other than those in e(b) will not be available for this specific example. Because of issues and other considerations, some estimation commands may not be officially supported.

Applicability of combination rules

If the four requirements above are met, mi estimate, cmdok will correctly apply the Rubin's combination rules to multiply imputed data. However, mi estimate cannot determine whether the specific estimator has the required properties to ensure statistical validity of the final MI results. A user is responsible for checking whether the combination rules are applicable to the estimator of interest. In general, combination rules are applicable to estimators that are asymptotically normal with the corresponding variance–covariance matrix being consistently estimated. Also, combination rules should be applied to the estimators in the metric for which their sampling distributions are closest to the normal distribution. For more information about the statistical requirements of statistical validity of MI results, see Rubin (1996).

In the earlier PCA example, inference on the eigenvalues and eigenvectors mainly relies on the assumption that the variables are multivariate normally distributed. In this case, the eigenvalues and eigenvectors can be estimated using maximum likelihood with the estimates being asymptotically (multivariate) normally distributed (Anderson 1963; Jackson 2003). If the analyzed variables are not multivariate normally distributed, the MI results above would not be statistically valid.

References

Anderson, T. W. 1963. Asymptotic theory for principal component analysis. Annals of Mathematical Statistics 34: 122–148.

Jackson, J. E. 2003. A User’s Guide to Principal Components. New York: Wiley

Rubin, D. B. 1996. Multiple imputation after 18+ years. Journal of the American Statistical Association 91: 473–489.