[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

From |
"Stas Kolenikov" <skolenik@gmail.com> |

To |
statalist@hsphsun2.harvard.edu |

Subject |
Re: st: IV with missing values |

Date |
Tue, 22 Jul 2008 09:50:22 -0500 |

I am not sure you will see any efficiency gains in trying to predict y2 for the rest of the sample and plugging it back to the second stage regression, even if there were a way to get the standard errors right. With some extraordinary stretch of imagination (such as assuming multivariate normality of everything), you could get a maximum likelihood estimate of the joint covariance matrix of x, y1, y2 and z using EM algorithm say, and then form the estimate of b from that matrix, getting the standard errors by the delta method. This might even work for non-normal data provided you are able to estimate variability in that covariance matrix consistently. But as I said, I would imagine the efficiency gains will hardly justify the trouble. As Maarten suggested, you could run a version of imputation procedure imputing the missing values of y2 by a regression on z and x plus the error with the distribution similar to that of the residuals from this regression. I would be more convinced by a bootstrap approach where you would take bootstrap samples from the original data, run your regression of y2 on x and z, predict y2 for the remaining observations, and plug this into the second stage regression. (Check if a similar procedure on complete data only will produce something resembling the proper standard errors though.) If you suspect that y2 is informatively missing (rather than missing at random... I hope you are familiar with those concepts), then things will probably get quite a bit more complicated. There might be some work on missing data with instrumental variables estimators, but the direction the modern econometrics tends to lean to is partial identification where some extreme counterfactuals are proposed for the missing data, and estimation and inference are aimed at an interval of parameters rather than a point estimate like in classical statistics. On Tue, Jul 22, 2008 at 7:52 AM, sara borelli <saraborelli77@yahoo.it> wrote: > Dear All, > > I am estimating the following regression: > > y1= ax + by2 + u > where y2 is endogenous and I am using some varaible z as identifying instrument > > y1, x, z are osberved for the whole sample, but y2 is missing for 30% of observations. > If I use ivreg, stata estimates the model only on the non-missing observations. But I need to estimate the model on the whole sample. > Therefore I explicitly performed the two steps separately, predicting y2 in the first stage for the whole sample and inserting it into the second stage. But I know the standard errors may be biased. Does anyone know a way to estimate this correctly? > > Thank you for any help > Sara Borelli > -- Stas Kolenikov, also found at http://stas.kolenikov.name Small print: Please do not reply to my Gmail address as I don't check it regularly. * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

**Follow-Ups**:**RE: st: IV with missing values***From:*"Schaffer, Mark E" <M.E.Schaffer@hw.ac.uk>

**References**:**st: IV with missing values***From:*sara borelli <saraborelli77@yahoo.it>

- Prev by Date:
**st: Display ValuesLabels and NumericCodes in Tabulate Tables** - Next by Date:
**Re: st: Display ValuesLabels and NumericCodes in Tabulate Tables** - Previous by thread:
**Re: st: IV with missing values** - Next by thread:
**RE: st: IV with missing values** - Index(es):

© Copyright 1996–2017 StataCorp LLC | Terms of use | Privacy | Contact us | What's new | Site index |