--- Richard Williams wrote:
> > > I notice that when you use the mim command (available from SSC) you
> > > don't get an R^2 statistic, e.g.
><snip>
> > > Would it be legit to average the R^2 from the separate imputations,
> > > or to otherwise compute it
At 11:37 AM 4/26/2007, Maarten Buis wrote:
> >Donald Rubin answered this question on the impute list. See:
> >http://www.mail-archive.com/[email protected]/msg00102.html
--- Richard Williams wrote:
> Thanks. I am not sure I understand his answer, but thanks. I just
> wish everyone would get in the habit of presenting Stata code when
> answering so I wouldn't have to figure these things out for myself. :)
No Stata code in an answer? How inconsiderate of him!
I read his answer as follows:
Compute the mean of the log(R-square s) and transform that mean back
to the original metric, since the sampling distribution of log(R-square)
is closer to normality than the sampling distribution of R-square itself.
So in Stata code:
*------------- begin example ---------------
set more off
sysuse auto, clear
recode rep78 1/2=3
tempfile temp
ice rep78 price foreign mpg weight using `temp', m(5)
use `temp', clear
mim: reg price foreign rep78
sum _mj, meanonly
local m = r(max)
local lr2 = 0
local r2 = 0
forvalues j = 1/`m' {
qui reg price foreign rep78 if _mj==`j'
local lr2 = `lr2' + ln(`e(r2)')
local r2 = `r2' + `e(r2)'
}
di "r square is according to Rubin " exp(`lr2'/`m')
di "r square is " `r2'/`m'
*--------------- end example ---------------------
(For more on how to use examples I sent to the Statalist, see:
http://home.fsw.vu.nl/m.buis/stata/exampleFAQ.html )
The "LRT version of Meng&Rubin" refers to (Meng and Rubin 1992)
> There are other things you don't get when using mim, e.g. model
> chi-squares;
If the model of interest allows the from() option (e.g. -glm-) you
can use the method of Meng and Rubin (1992), like in the example
below. I used the description in (Schafer 1997) page 117. Someday,
when I am not that busy with my dissertation, and after I have
finished some other promises I made, like written a Stata Journal
article for -betafit-, and writing a ``zero one inflated fractional
logit'' command, I will clean up this code and make it into a
program (i.e. it would be very unhealthy to hold your breath till
that happens).
*-------------------- begin example -------------------
set more off
sysuse auto, clear
recode rep78 1/2=3
tempfile temp
ice rep78 price foreign mpg weight using `temp', m(5)
use `temp', clear
mim : glm foreign price rep78, family(binomial) link(logit)
matrix b = e(MIM_Q)
forvalues j = 1/`e(MIM_m)' {
scalar ll`j' = `e(MIM_`j'_ll)'
}
mim: glm foreign , family(binomial) link(logit)
matrix b0 = e(MIM_Q)
qui count if missing(foreign) & _mj==0
local comp = r(N) == 0
scalar dbar = 0
forvalues j = 1/`e(MIM_m)' {
if `comp' {
scalar dbar = dbar + 2*(ll`j' - `e(MIM_1_ll)')
}
else {
scalar dbar = dbar + 2*(ll`j' - `e(MIM_`j'_ll)')
}
}
scalar dbar = dbar/`e(MIM_m)'
local m = `e(MIM_m)'
scalar dtilde = 0
forvalues j = 1/`e(MIM_m)' {
glm foreign price rep78 if _mj==`j', family(binomial) link(logit) from(b) iter(0)
scalar k = e(df_m)
scalar ll = e(ll)
glm foreign if _mj==`j', family(binomial) link(logit) from(b0) iter(0)
scalar k = k - e(df_m)
scalar dtilde = dtilde + 2*(ll-`e(ll)')
}
scalar dtilde = dtilde/`m'
scalar r3 = (`m' + 1)/(k*(`m'-1))*(dbar-dtilde)
scalar D3 = dtilde/(k*(1+r3))
tempname t p
scalar `t' = k*(`m'-1)
if `t' > 4 scalar nu3 = 4 + (`t'-4)*(1+(1-2/`t')/r3)^2
else scalar nu3 = (`t'*(1+1/k)*(1+1/r3)^2)/2
scalar `p' = Ftail(k, nu3, D3)
di "F(" k ", " nu3 ") = " D3
di "Prob > F = "`p'
*---------------------- end example ------------------
(For more on how to use examples I sent to the Statalist, see:
http://home.fsw.vu.nl/m.buis/stata/exampleFAQ.html )
> many post-estimation commands don't work. Is there
> research on how often something terrible horrible awful happens if
> you just do something like
>
> reg y x1 x2 x3 [iw = .20]
>
> I'm sure the above is sub-optimal, but is it likely to matter much in practice?
I have three comments on this method (lets call it Richard's Rule):
1) At the very least you want to change that to:
reg y x1 x2 x3 [iw = .20] if _mj>0
since the data will also contain the original data with missing values, and that
data is identified by the value 0 for _mj.
2) The entire aim of Multiple Imputation is to take into account the fact that the
imputed value, aren't real observations but best guesses. This is done by also taking
the variance of estimates between completed datasets into account. If the estimates
in each complete sample are close together than the fact that the imputed values are just
guesses didn't add much uncertainty. If the estimates in each sample are far apart than
the uncertainty should be increased. Richard's Rule will not do that, while Rubin's rule
will.
3) Finally, Stata's excellent simulation tools can help you assess how bad or good
Richard's Rule works compared to Rubin's Rule. The point estimates seem fine, but the
coverage seems to be off. To get a better view of the coverage, you would probably have
to run the simulation longer, but even with 500 simulation it seems pretty clear that with
Richard's Rule you reject a true null hypotheses in more than 5% of the samples, while the
coverage for Rubin's Rule is closer to the nominal 5%.
*-------------- begin example --------------------
set more off
capture program drop gendata sim
program define gendata
drop _all
set obs 500
gen x1 = invnorm(uniform())
gen x2 = .5*x1 + 2*invnorm(uniform())
gen y = x1 + x2 + 2*invnorm(uniform())
replace x1 = . if invlogit(-3 + y) > uniform()
replace x2 = . if invlogit(-3 + y) > uniform()
end
program define sim, rclass
gendata
tempfile temp
tempname df nu t p
ice y x1 x2 using `temp', m(5)
use `temp', clear
micombine regress y x1 x2
matrix `nu' = e(nu)
return scalar mi1 = _b[x1]
scalar `df' =`nu'[1,1]
scalar `t' = (_b[x1]-1)/_se[x1]
return scalar mip1 = 2* ttail(`df', abs(`t'))
return scalar mi2 = _b[x2]
scalar `df' =`nu'[1,2]
scalar `t' = (_b[x2]-1)/_se[x2]
return scalar mip2 = 2* ttail(`df', abs(`t'))
return scalar mi0 = _b[_cons]
scalar `df' =`nu'[1,3]
scalar `t' = _b[_cons]/_se[_cons]
return scalar mip0 = 2* ttail(`df', abs(`t'))
regress y x1 x2 [iw=.2] if _mj>0
return scalar w1 = _b[x1]
return scalar wp1 = 2 * ttail(`e(df_r)', abs((_b[x1]-1)/_se[x1]))
return scalar w2 = _b[x2]
return scalar wp2 = 2 * ttail(`e(df_r)', abs((_b[x2]-1)/_se[x2]))
return scalar w0 = _b[_cons]
return scalar wp0 = 2 * ttail(`e(df_r)', abs(_b[_cons]/_se[_cons]))
end
simulate sim mi1=r(mi1) mi2=r(mi2) mi0=r(mi0) w1=r(w1) w2=r(w2) w0=r(w0) /*
*/ mip1=r(mip1) mip2=r(mip2) mip0=r(mip0) wp1=r(wp1) wp2=r(wp2) wp0=r(wp0), /*
*/ reps(500) dots
twoway kdensity mi1 || kdensity w1, xline(1) /*
*/ legend(label(1 "Rubin's rule") label(2 "Richard's rule")) /*
*/ name(x1, replace)
twoway kdensity mi2 || kdensity w2, xline(1) /*
*/ legend(label(1 "Rubin's rule") label(2 "Richard's rule")) /*
*/ name(x2, replace)
twoway kdensity mi0 || kdensity w0, xline(0) /*
*/ legend(label(1 "Rubin's rule") label(2 "Richard's rule")) /*
*/ name(x0, replace)
forvalues i = 0/2 {
gen misig`i' = mip`i' <.05
gen wsig`i' = wp`i' <.05
}
sum misig* wsig*
*--------------------- end example -----------------------
(For more on how to use examples I sent to the Statalist, see:
http://home.fsw.vu.nl/m.buis/stata/exampleFAQ.html )
(I use -micombine- in this simulation because -mim- resets the seed,
thus messing up the simulation)
Hope this helps,
Maarten
Meng, X.L. and Rubin, D.B. (1992) ``Performing likelihood ratio tests with
multiply-imputed data sets''. Biometrika, 79, pp. 103-111.
Schafer, J.L. (1997) Analysis of Incomplete multivariate data. Boca Raton:
Chapman & Hall/CRC
-----------------------------------------
Maarten L. Buis
Department of Social Research Methodology
Vrije Universiteit Amsterdam
Boelelaan 1081
1081 HV Amsterdam
The Netherlands
visiting address:
Buitenveldertselaan 3 (Metropolitan), room Z434
+31 20 5986715
http://home.fsw.vu.nl/m.buis/
-----------------------------------------
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/