Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down on April 23, and its replacement, statalist.org is already up and running.

# Re: st: A correlation matrix after multiple imputation

 From weddings@stata.com (Wesley D. Eddings, StataCorp) To statalist@hsphsun2.harvard.edu Subject Re: st: A correlation matrix after multiple imputation Date Mon, 26 Jul 2010 17:14:05 -0500

```On Friday 23 July 2010, Alan Acock <acock@mac.com> asked if there is an easy
way to obtain a multiple-imputation estimate of a correlation matrix in Stata:

> Is there an easy way to obtain the 20 correlation matrices, one for each of
> the 20 imputed datasets and then somehow pulling these?

There is no automatic way of doing this, but with some programming effort,
Alan can use -mi estimate- to obtain an MI estimate of the correlation matrix;
see the code at the end of this post.

However, from a statistical standpoint, it is not clear whether averaging
completed-data sample estimates of the correlation matrix across imputed data
is the best approach to account for missing data when computing a correlation
matrix.

One alternative is to consider reporting an EM estimate of the covariance (or
correlation) matrix adjusted for missing data.  Such an estimate can be
obtained from -mi impute mvn-, because -mi impute mvn- uses the EM algorithm
to get starting values of the parameters for the MCMC procedure.  The EM
estimates of the coefficients and the variance-covariance matrix are saved
after -mi impute mvn- in the -r(Beta_em)- and -r(Sigma_em)- matrices,
respectively.  To obtain EM estimates only, without producing imputations,
specify the -emonly- option with -mi impute mvn-; see the example below.

-- Wes                  -- Yulia
weddings@stata.com      ymarchenko@stata.com

=================== EXAMPLES ================================================

Here is how you can obtain an EM estimate of the correlation matrix accounting
for missing data:

/****************** begin do file ******************/
sysuse auto, clear
set seed 12345
replace mpg = . if runiform()>0.9
mi set wide
mi register imputed mpg weight
mi impute mvn mpg weight, emonly
mat Sigma = r(Sigma_em) /* save EM estimate of the variance-covariance (VC) matrix */
_getcovcorr Sigma, corr shape(full)   /* convert VC to a correlation matrix */
mat C = r(C)
matlist C
/*************** end do file *****************/

Here is how you can obtain an MI estimate of the correlation matrix:

/***** begin MI correlation ******************/
cap program drop ecorr
program ecorr, eclass
version 11
syntax [varlist] [if] [in] [aw fw] [, * ]
if (`"`weight'"'!="") {
local wgt `weight'`exp'
}
marksample touse
correlate `varlist' `if' `in' `wgt', `options'
tempname b V
mata: st_matrix("`b'", vech(st_matrix("r(C)"))')
local p = colsof(`b')
mat `V' = J(`p',`p',0)
local cols: colnames `b'
mat rownames `V' = `cols'
eret post `b' `V' [`wgt'] , obs(`=r(N)') esample(`touse')
eret local cmd ecorr
eret local title "Lower-diagonal correlation matrix"
eret local vars "`varlist'"

end

cap program drop micorr
program micorr, rclass
tempname esthold
_estimates hold `esthold', nullok restore
qui mi estimate, cmdok: ecorr `0'
tempname C_mi
mata: st_matrix("`C_mi'", invvech(st_matrix("e(b_mi)")'))
mat colnames `C_mi' = `e(vars)'
mat rownames `C_mi' = `e(vars)'
di
di as txt "Multiple-imputation estimate of the correlation matrix"
di as txt "(obs=" string(e(N_mi),"%9.0g") ")"
matlist `C_mi'
return clear
ret matrix C_mi = `C_mi'
end

sysuse auto, clear
set seed 12345
replace mpg = . if runiform()>0.9
mi set wide
mi register imputed mpg weight
mi impute mvn mpg weight, add(20)
micorr mpg weight
/***** end MI correlation ********************/
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
```