Statalist The Stata Listserver


[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

RE: st: RE: mim: reg command


From   "Maarten Buis" <M.Buis@fsw.vu.nl>
To   <statalist@hsphsun2.harvard.edu>
Subject   RE: st: RE: mim: reg command
Date   Fri, 27 Apr 2007 12:55:12 +0200

--- Richard Williams wrote:
> > > I notice that when you use the mim command (available from SSC) you
> > > don't get an R^2 statistic, e.g.
><snip>
> > > Would it be legit to average the R^2 from the separate imputations,
> > > or to otherwise compute it

At 11:37 AM 4/26/2007, Maarten Buis wrote:
> >Donald Rubin answered this question on the impute list. See:
> >http://www.mail-archive.com/impute@lists.utsouthwestern.edu/msg00102.html


--- Richard Williams wrote:
> Thanks.  I am not sure I understand his answer, but thanks.  I just
> wish everyone would get in the habit of presenting Stata code when
> answering so I wouldn't have to figure these things out for myself.  :)

No Stata code in an answer? How inconsiderate of him! 

I read his answer as follows:
Compute the mean of the log(R-square s) and transform that mean back
to the original metric, since the sampling distribution of log(R-square)
is closer to normality than the sampling distribution of R-square itself.
So in Stata code:

*------------- begin example ---------------
set more off
sysuse auto, clear
recode rep78 1/2=3
tempfile temp
ice rep78 price foreign mpg weight using `temp', m(5)
use `temp', clear

mim: reg price foreign rep78

sum _mj, meanonly
local m = r(max)

local lr2 = 0
local r2 = 0
forvalues j = 1/`m' {
	qui reg price foreign rep78 if _mj==`j'
	local lr2 = `lr2' + ln(`e(r2)')
	local r2 = `r2' + `e(r2)'
}

di "r square is according to Rubin " exp(`lr2'/`m')
di "r square is " `r2'/`m'
*--------------- end example ---------------------
(For more on how to use examples I sent to the Statalist, see:
http://home.fsw.vu.nl/m.buis/stata/exampleFAQ.html )

The "LRT version of Meng&Rubin" refers to (Meng and Rubin 1992)

> There are other things you don't get when using mim, e.g. model
> chi-squares; 

If the model of interest allows the from() option (e.g. -glm-) you
can use the method of Meng and Rubin (1992), like in the example
below. I used the description in (Schafer 1997) page 117. Someday,
when I am not that busy with my dissertation, and after I have 
finished some other promises I made, like written a Stata Journal 
article for -betafit-, and writing a ``zero one inflated fractional 
logit'' command, I will clean up this code and make it into a 
program (i.e. it would be very unhealthy to hold your breath till
that happens). 

*-------------------- begin example -------------------
set more off
sysuse auto, clear
recode rep78 1/2=3
tempfile temp

ice rep78 price foreign mpg weight using `temp', m(5)
use `temp', clear

mim : glm foreign price rep78, family(binomial) link(logit)
matrix b = e(MIM_Q)

forvalues j = 1/`e(MIM_m)' {
	scalar ll`j' = `e(MIM_`j'_ll)'
}

mim: glm foreign , family(binomial) link(logit)
matrix b0 = e(MIM_Q)

qui count if missing(foreign) & _mj==0
local comp = r(N) == 0

scalar dbar = 0
forvalues j = 1/`e(MIM_m)' {
	if `comp' {
		scalar dbar = dbar + 2*(ll`j' - `e(MIM_1_ll)')
	}
	else {
		scalar dbar = dbar + 2*(ll`j' - `e(MIM_`j'_ll)')
	}
}
scalar dbar = dbar/`e(MIM_m)'

local m = `e(MIM_m)'
scalar dtilde = 0
forvalues j = 1/`e(MIM_m)' {
	glm foreign price rep78 if _mj==`j', family(binomial) link(logit) from(b) iter(0)
	scalar k = e(df_m)
	scalar ll = e(ll)
	glm foreign if _mj==`j', family(binomial) link(logit) from(b0) iter(0)
	scalar k = k - e(df_m)
	scalar dtilde = dtilde + 2*(ll-`e(ll)')
}
scalar dtilde = dtilde/`m'

scalar r3 = (`m' + 1)/(k*(`m'-1))*(dbar-dtilde)

scalar D3 = dtilde/(k*(1+r3))

tempname t p

scalar `t' = k*(`m'-1)

if `t' > 4 scalar nu3 = 4 + (`t'-4)*(1+(1-2/`t')/r3)^2
else scalar nu3 = (`t'*(1+1/k)*(1+1/r3)^2)/2

scalar `p' = Ftail(k, nu3, D3)

di "F(" k ", " nu3 ") = " D3 
di "Prob > F = "`p'
*---------------------- end example ------------------
(For more on how to use examples I sent to the Statalist, see:
http://home.fsw.vu.nl/m.buis/stata/exampleFAQ.html )

> many post-estimation commands don't work.  Is there
> research on how often something terrible horrible awful happens if
> you just do something like
> 
> reg y x1 x2 x3 [iw = .20]
> 
> I'm sure the above is sub-optimal, but is it likely to matter much in practice?

I have three comments on this method (lets call it Richard's Rule):
1) At the very least you want to change that to:

reg y x1 x2 x3 [iw = .20] if _mj>0

since the data will also contain the original data with missing values, and that
data is identified by the value 0 for _mj. 


2) The entire aim of Multiple Imputation is to take into account the fact that the 
imputed value, aren't real observations but best guesses. This is done by also taking
the variance of estimates between completed datasets into account. If the estimates
in each complete sample are close together than the fact that the imputed values are just 
guesses didn't add much uncertainty. If the estimates in each sample are far apart than
the uncertainty should be increased. Richard's Rule will not do that, while Rubin's rule
will.

3) Finally, Stata's excellent simulation tools can help you assess how bad or good  
Richard's Rule works compared to Rubin's Rule. The point estimates seem fine, but the 
coverage seems to be off. To get a better view of the coverage, you would probably have 
to run the simulation longer, but even with 500 simulation it seems pretty clear that with 
Richard's Rule you reject a true null hypotheses in more than 5% of the samples, while the 
coverage for Rubin's Rule is closer to the nominal 5%. 

*-------------- begin example --------------------
set more off
capture program drop gendata sim
program define gendata
	drop _all
	set obs 500
	gen x1 = invnorm(uniform())
	gen x2 = .5*x1 + 2*invnorm(uniform())
	gen y = x1 + x2 + 2*invnorm(uniform())
	replace x1 = . if invlogit(-3 + y) > uniform()
	replace x2 = . if invlogit(-3 + y) > uniform()
end 

program define sim, rclass
	gendata
	tempfile temp
	tempname df nu t p

	ice y x1 x2 using `temp', m(5)
	use `temp', clear
	micombine regress y x1 x2
	matrix `nu' = e(nu)

	return scalar mi1 = _b[x1]
      scalar `df' =`nu'[1,1]
      scalar `t' = (_b[x1]-1)/_se[x1]
      return scalar mip1 = 2* ttail(`df', abs(`t'))

	return scalar mi2 = _b[x2]
      scalar `df' =`nu'[1,2]
      scalar `t' = (_b[x2]-1)/_se[x2]
      return scalar mip2 = 2* ttail(`df', abs(`t'))

	return scalar mi0 = _b[_cons]
      scalar `df' =`nu'[1,3]
      scalar `t' = _b[_cons]/_se[_cons]
      return scalar mip0 = 2* ttail(`df', abs(`t'))


	regress y x1 x2 [iw=.2] if _mj>0
	return scalar w1 = _b[x1]
	return scalar wp1 = 2 * ttail(`e(df_r)', abs((_b[x1]-1)/_se[x1]))
	return scalar w2 = _b[x2]
	return scalar wp2 = 2 * ttail(`e(df_r)', abs((_b[x2]-1)/_se[x2]))
	return scalar w0 = _b[_cons]
	return scalar wp0 = 2 * ttail(`e(df_r)', abs(_b[_cons]/_se[_cons]))
end


simulate sim mi1=r(mi1) mi2=r(mi2) mi0=r(mi0) w1=r(w1) w2=r(w2) w0=r(w0) /*
*/ mip1=r(mip1) mip2=r(mip2) mip0=r(mip0) wp1=r(wp1) wp2=r(wp2) wp0=r(wp0), /*
*/ reps(500) dots

twoway kdensity mi1 || kdensity w1, xline(1) /*
*/ legend(label(1 "Rubin's rule") label(2 "Richard's rule")) /*
*/ name(x1, replace)

twoway kdensity mi2 || kdensity w2, xline(1) /*
*/ legend(label(1 "Rubin's rule") label(2 "Richard's rule")) /*
*/ name(x2, replace)

twoway kdensity mi0 || kdensity w0, xline(0) /*
*/ legend(label(1 "Rubin's rule") label(2 "Richard's rule")) /*
*/ name(x0, replace)

forvalues i = 0/2 {
	gen misig`i' = mip`i' <.05
	gen wsig`i'  = wp`i' <.05
}
sum misig* wsig*

*--------------------- end example -----------------------
(For more on how to use examples I sent to the Statalist, see:
http://home.fsw.vu.nl/m.buis/stata/exampleFAQ.html )

(I use -micombine- in this simulation because -mim- resets the seed,
thus messing up the simulation)

Hope this helps,
Maarten

Meng, X.L. and Rubin, D.B. (1992) ``Performing likelihood ratio tests with 
multiply-imputed data sets''. Biometrika, 79, pp. 103-111.

Schafer, J.L. (1997) Analysis of Incomplete multivariate data. Boca Raton: 
Chapman & Hall/CRC

-----------------------------------------
Maarten L. Buis
Department of Social Research Methodology 
Vrije Universiteit Amsterdam 
Boelelaan 1081 
1081 HV Amsterdam 
The Netherlands

visiting address:
Buitenveldertselaan 3 (Metropolitan), room Z434 

+31 20 5986715

http://home.fsw.vu.nl/m.buis/
-----------------------------------------


*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index