[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

From |
"Schaffer, Mark E" <M.E.Schaffer@hw.ac.uk> |

To |
<statalist@hsphsun2.harvard.edu> |

Subject |
RE: st: IV with missing values |

Date |
Tue, 22 Jul 2008 18:48:15 +0100 |

Stas, Sara, I'm not as pessimistic as Stas in principle, though maybe I am in practice. What Sara wants to do is similar to the "Split Sample Instrumental Variables" (SSIV) estimator proposed by Angrist and Krueger; see http://ideas.repec.org/p/nbr/nberte/0150.html In this estimator, the sample is split randomly, one half is used to estimate the first stage parameters, and then these estimated parameters and the other half of the sample are used to get the parameters from the main model. This estimator is biased but A-K also proposed an unbiased version. Inoue and Solon have a paper on this as well: http://ideas.repec.org/p/nbr/nberte/0311.html and they discuss TSIV as a GMM estimator, TSLIML (two sample LIML), and various other things. My reason for being pessimistic is that this looks like a lot more work than I suspect Sara was hoping for! Possibly not worth the investment. --Mark Prof. Mark Schaffer Department of Economics School of Management & Languages Heriot-Watt University Edinburgh EH14 4AS tel +44-131-451-3494 / fax +44-131-451-3296 http://ideas.repec.org/e/psc51.html > -----Original Message----- > From: owner-statalist@hsphsun2.harvard.edu > [mailto:owner-statalist@hsphsun2.harvard.edu] On Behalf Of > Stas Kolenikov > Sent: 22 July 2008 15:50 > To: statalist@hsphsun2.harvard.edu > Subject: Re: st: IV with missing values > > I am not sure you will see any efficiency gains in trying to predict > y2 for the rest of the sample and plugging it back to the > second stage regression, even if there were a way to get the > standard errors right. > With some extraordinary stretch of imagination (such as > assuming multivariate normality of everything), you could get > a maximum likelihood estimate of the joint covariance matrix > of x, y1, y2 and z using EM algorithm say, and then form the > estimate of b from that matrix, getting the standard errors > by the delta method. This might even work for non-normal data > provided you are able to estimate variability in that > covariance matrix consistently. But as I said, I would > imagine the efficiency gains will hardly justify the trouble. > > As Maarten suggested, you could run a version of imputation > procedure imputing the missing values of y2 by a regression > on z and x plus the error with the distribution similar to > that of the residuals from this regression. I would be more > convinced by a bootstrap approach where you would take > bootstrap samples from the original data, run your regression > of y2 on x and z, predict y2 for the remaining observations, > and plug this into the second stage regression. (Check if a > similar procedure on complete data only will produce > something resembling the proper standard errors though.) > > If you suspect that y2 is informatively missing (rather than > missing at random... I hope you are familiar with those > concepts), then things will probably get quite a bit more > complicated. There might be some work on missing data with > instrumental variables estimators, but the direction the > modern econometrics tends to lean to is partial > identification where some extreme counterfactuals are > proposed for the missing data, and estimation and inference > are aimed at an interval of parameters rather than a point > estimate like in classical statistics. > > On Tue, Jul 22, 2008 at 7:52 AM, sara borelli > <saraborelli77@yahoo.it> wrote: > > Dear All, > > > > I am estimating the following regression: > > > > y1= ax + by2 + u > > where y2 is endogenous and I am using some varaible z as > identifying > > instrument > > > > y1, x, z are osberved for the whole sample, but y2 is > missing for 30% of observations. > > If I use ivreg, stata estimates the model only on the > non-missing observations. But I need to estimate the model on > the whole sample. > > Therefore I explicitly performed the two steps separately, > predicting y2 in the first stage for the whole sample and > inserting it into the second stage. But I know the standard > errors may be biased. Does anyone know a way to estimate this > correctly? > > > > Thank you for any help > > Sara Borelli > > > > > -- > Stas Kolenikov, also found at http://stas.kolenikov.name > Small print: Please do not reply to my Gmail address as I > don't check it regularly. > * > * For searches and help try: > * http://www.stata.com/help.cgi?search > * http://www.stata.com/support/statalist/faq > * http://www.ats.ucla.edu/stat/stata/ > -- Heriot-Watt University is a Scottish charity registered under charity number SC000278. * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

**References**:**st: IV with missing values***From:*sara borelli <saraborelli77@yahoo.it>

**Re: st: IV with missing values***From:*"Stas Kolenikov" <skolenik@gmail.com>

- Prev by Date:
**st: -margeff- question (interaction effects, -at- and -dummies- option)** - Next by Date:
**st: importing spss data** - Previous by thread:
**Re: st: IV with missing values** - Next by thread:
**st: -inteff- with categorical dummies?** - Index(es):

© Copyright 1996–2014 StataCorp LP | Terms of use | Privacy | Contact us | What's new | Site index |