[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

RE: st: IV with missing values

From	"Schaffer, Mark E" <[email protected]>
To	<[email protected]>
Subject	RE: st: IV with missing values
Date	Tue, 22 Jul 2008 18:48:15 +0100

Stas, Sara,

I'm not as pessimistic as Stas in principle, though maybe I am in
practice.

What Sara wants to do is similar to the "Split Sample Instrumental
Variables" (SSIV) estimator proposed by Angrist and Krueger; see

http://ideas.repec.org/p/nbr/nberte/0150.html

In this estimator, the sample is split randomly, one half is used to
estimate the first stage  parameters, and then these estimated
parameters and the other half of the sample are used to get the
parameters from the main model.  This estimator is biased but A-K also
proposed an unbiased version.

Inoue and Solon have a paper on this as well:

http://ideas.repec.org/p/nbr/nberte/0311.html

and they discuss TSIV as a GMM estimator, TSLIML (two sample LIML), and
various other things.

My reason for being pessimistic is that this looks like a lot more work
than I suspect Sara was hoping for!  Possibly not worth the investment.

--Mark

Prof. Mark Schaffer
Department of Economics
School of Management & Languages
Heriot-Watt University
Edinburgh EH14 4AS
tel +44-131-451-3494 / fax +44-131-451-3296
http://ideas.repec.org/e/psc51.html
 

> -----Original Message-----
> From: [email protected] 
> [mailto:[email protected]] On Behalf Of 
> Stas Kolenikov
> Sent: 22 July 2008 15:50
> To: [email protected]
> Subject: Re: st: IV with missing values
> 
> I am not sure you will see any efficiency gains in trying to predict
> y2 for the rest of the sample and plugging it back to the 
> second stage regression, even if there were a way to get the 
> standard errors right.
> With some extraordinary stretch of imagination (such as 
> assuming multivariate normality of everything), you could get 
> a maximum likelihood estimate of the joint covariance matrix 
> of x, y1, y2 and z using EM algorithm say, and then form the 
> estimate of b from that matrix, getting the standard errors 
> by the delta method. This might even work for non-normal data 
> provided you are able to estimate variability in that 
> covariance matrix consistently. But as I said, I would 
> imagine the efficiency gains will hardly justify the trouble.
> 
> As Maarten suggested, you could run a version of imputation 
> procedure imputing the missing values of y2 by a regression 
> on z and x plus the error with the distribution similar to 
> that of the residuals from this regression. I would be more 
> convinced by a bootstrap approach where you would take 
> bootstrap samples from the original data, run your regression 
> of y2 on x and z, predict y2 for the remaining observations, 
> and plug this into the second stage regression. (Check if a 
> similar procedure on complete data only will produce 
> something resembling the proper standard errors though.)
> 
> If you suspect that y2 is informatively missing (rather than 
> missing at random... I hope you are familiar with those 
> concepts), then things will probably get quite a bit more 
> complicated. There might be some work on missing data with 
> instrumental variables estimators, but the direction the 
> modern econometrics tends to lean to is partial 
> identification where some extreme counterfactuals are 
> proposed for the missing data, and estimation and inference 
> are aimed at an interval of parameters rather than a point 
> estimate like in classical statistics.
> 
> On Tue, Jul 22, 2008 at 7:52 AM, sara borelli 
> <[email protected]> wrote:
> > Dear All,
> >
> > I am estimating the following regression:
> >
> > y1= ax + by2 + u
> > where y2 is endogenous and I am using some varaible z as 
> identifying 
> > instrument
> >
> > y1, x, z are osberved for the whole sample, but y2 is 
> missing for 30% of observations.
> > If I use ivreg, stata estimates the model only on the 
> non-missing observations. But I need to estimate the model on 
> the whole sample.
> > Therefore I explicitly performed the two steps  separately, 
> predicting y2 in the first stage for the whole sample and 
> inserting it into the second stage. But I know the standard 
> errors may be biased. Does anyone know a way to estimate this 
> correctly?
> >
> > Thank you for any help
> > Sara Borelli
> >
> 
> 
> --
> Stas Kolenikov, also found at http://stas.kolenikov.name 
> Small print: Please do not reply to my Gmail address as I 
> don't check it regularly.
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/statalist/faq
> *   http://www.ats.ucla.edu/stat/stata/
> 


-- 
Heriot-Watt University is a Scottish charity
registered under charity number SC000278.


*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

References:
- st: IV with missing values
  - From: sara borelli <[email protected]>
- Re: st: IV with missing values
  - From: "Stas Kolenikov" <[email protected]>

Prev by Date: st: -margeff- question (interaction effects, -at- and -dummies- option)
Next by Date: st: importing spss data
Previous by thread: Re: st: IV with missing values
Next by thread: st: -inteff- with categorical dummies?
Index(es):
- Date
- Thread