
From  "Rodrigo A. Alfaro" <raalfaroa@gmail.com> 
To  <statalist@hsphsun2.harvard.edu> 
Subject  st: Re: Re: 2sls with multiplyimputed data sets 
Date  Tue, 29 May 2007 11:21:18 0400 
Thanks Rodrigo!
As regards the Hausman test, the only solution I can think of is to use the regressionbased form of the test where I combine the results of the regression only at the second stage.
Does it make any sense or would it be better to just have 5 different Hausman tests?
Suppose that the multiplyimputed dataset is stored in 5 separate files: mydata1.dta, mydata2.dta, mydata3.dta, mydata4.dta, mydata5.dta
The Stata code would be the following:
forvalues i=1(1)5 {
use mydata`i'.dta
regress y2 z1 z2 x1 x2
predict res if e(sample), resid
save, replace
}
clear
set memory 500m
mimstack, m(5) so("id") nomj0 istub(mydata)
mim: regress y1 y2 x1 x2 res, cluster(sampid2)
[y2 is the endogenous variable and z1, z2 are the excluded instruments]
Best
Viola

From "Rodrigo A. Alfaro" <raalfaroa@gmail.com>
To <statalist@hsphsun2.harvard.edu>
Subject st: Re: 2sls with multiplyimputed data sets
Date Fri, 25 May 2007 18:23:15 0400
We had a similar discussion on the list this week. In that case, the topic was the R2 for Multiple Imputation (MI). Maarten proposed (for R2 case) to compute the geometric average instead of arithmetic one, based on Donald Rubin's reply somewhere else. Hansen J test is asymptotically distributed as chisquare, maybe a similar suggestion applies for your case.
My own suggestion for the R2 was to report your simple average and write a small note with the min/max R2 along your regressions. In your case, I suggest to analyze more in deep the figures for Hansen J tests and the pvalues associated with these. I think that is perfectly OK to have pvalues of 0.01 0.008, etc. (similar magnitud)... and I don't expect to see very different values for Hansen J test as well. If so... then you have problems with the model and/or the method of MI.
All this works if your # of missing over the total observations is few and if you imputed all the variables (including the variables used in the first step) at once. Finally, MI methods are based on simulations then in practice I generate more than 5 datasets and play with some combinations of 5 datasets (2nd to 6th, etc) and with more datasets (8, 10 or 12) to see if the results change.
R
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/
* * For searches and help try: * http://www.stata.com/support/faqs/res/findit.html * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/
© Copyright 1996–2017 StataCorp LLC  Terms of use  Privacy  Contact us  What's new  Site index 