Statalist The Stata Listserver

[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

st: Re: Re: 2sls with multiply-imputed data sets

From   "Rodrigo A. Alfaro" <>
To   <>
Subject   st: Re: Re: 2sls with multiply-imputed data sets
Date   Tue, 29 May 2007 11:21:18 -0400

The idea sounds good, but I didn't check the code. R

----- Original Message ----- From: "Viola Angelini" <>
To: <>
Sent: Tuesday, May 29, 2007 3:57 AM
Subject: st: Re: 2sls with multiply-imputed data sets

Thanks Rodrigo!
As regards the Hausman test, the only solution I can think of is to use the regression-based form of the test where I combine the results of the regression only at the second stage.
Does it make any sense or would it be better to just have 5 different Hausman tests?
Suppose that the multiply-imputed dataset is stored in 5 separate files: mydata1.dta, mydata2.dta, mydata3.dta, mydata4.dta, mydata5.dta
The Stata code would be the following:

forvalues i=1(1)5 {
use mydata`i'.dta
regress y2 z1 z2 x1 x2
predict res if e(sample), resid
save, replace
set memory 500m
mimstack, m(5) so("id") nomj0 istub(mydata)
mim: regress y1 y2 x1 x2 res, cluster(sampid2)

[y2 is the endogenous variable and z1, z2 are the excluded instruments]



From "Rodrigo A. Alfaro" <>
To <>
Subject st: Re: 2sls with multiply-imputed data sets
Date Fri, 25 May 2007 18:23:15 -0400

We had a similar discussion on the list this week. In that case, the topic was the R2 for Multiple Imputation (MI). Maarten proposed (for R2 case) to compute the geometric average instead of arithmetic one, based on Donald Rubin's reply somewhere else. Hansen J test is asymptotically distributed as chi-square, maybe a similar suggestion applies for your case.

My own suggestion for the R2 was to report your simple average and write a small note with the min/max R2 along your regressions. In your case, I suggest to analyze more in deep the figures for Hansen J tests and the p-values associated with these. I think that is perfectly OK to have pvalues of 0.01 0.008, etc. (similar magnitud)... and I don't expect to see very different values for Hansen J test as well. If so... then you have problems with the model and/or the method of MI.

All this works if your # of missing over the total observations is few and if you imputed all the variables (including the variables used in the first step) at once. Finally, MI methods are based on simulations then in practice I generate more than 5 datasets and play with some combinations of 5 datasets (2nd to 6th, etc) and with more datasets (8, 10 or 12) to see if the results change.

* For searches and help try:
*   For searches and help try:

© Copyright 1996–2017 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index