Hi Stata Users,

I have a question on Wooldrige's Procedure 18.1 which relates to IV estimation when the endogenous regressor is a binary variable.

Suppose I want to estimate the following equation:

Y = a0 + a1 * X1  + a2* X2 + error

where X1 is an indicator variable and is also endogenous. Assume that we have exactly 1 instrument Z for X1.

X2 is an exogenous variable. The parameter of interest is a1.

Now, for a more efficient estimation it is often suggested to use the following two step estimation method:

Step 1: Estimate a Probit for the binary endogenous regressor on all exogenous variable and the instrument variable to obtain fitted probabilities. 

Probit: X1  = f (Z, X2,) + error

Gives us predicted X1 : X1_hat1

Step 2: Use the fitted probabilities from (step 1 above) X1_hat1 and all exogenous regressors as instruments to obtain a more efficient estimate of the binary endogenous regressor. So in this step I  use the standard 2sls procedure:
First stage: X1= b0 = b1*X1_hat1 + b2*X2 + error

This gives us the predicted value X1_hat2.

Second Stage: Y = a0 + a1 * X1_hat2  + a2* X2 + error

My question relates to the validity of the instrument  Z. To argue for the validity of  this instrument should I consider Z's statistical significance in  the Probit model of  Step 1?


since step 1 is simply an extra step to obtain fitted probabilities to be used as instruments , we should only look  at the statistical significance of the fitted values (X1_hat1) in the first stage of  2sls estimation in Step 2, i.e. whether or not b1=0 ?

Wooldrige claims that we can ignore the step 1 estimation properties and focus only on step 2.  However he does not offer any explanation for the same. I will really appreciate any comments or suggestions you may have on this issue.



