# st: Instrumental Variables - Consistency with Nonlinear Endogenous Variable

 From Misha Spisok To statalist@hsphsun2.harvard.edu Subject st: Instrumental Variables - Consistency with Nonlinear Endogenous Variable Date Tue, 29 Dec 2009 00:26:04 -0800

```Hello, Statalist!

In Microeconometrics Using Stata, by Cameron and Trivedi, Exercise 11
of Chapter 6 (page 204) has a simulation exercise which I will give
below, followed by my "solution," then my question.  Apologies in
advance for the length of this message.  The basic question has to do
with the consistency of the IV estimator when the endogenous variable
enters the structural equation nonlinearly.

"When an endogenous variable enters the regression nonlinearly, the
obvious IV estimator is inconsistent and a modification is needed.
Specifically, suppose y1 = b*y2^2 + u, and the first-stage equation
for y2 is y2 = p*z + v, where the zero-mean errors u and v are
correlated.  Here the endogenous regressor appears in the structural
equation as y2^2 rather than y2.  The IV estimator  is b_hat_IV = (sum
z_i * y2_i^2)^(-1)*(sum z_i * y1_i).  This can be implemented by a
regular IV regression of y on y2^2 with the instrument z: regress y2^2
on z and then regress y1 on the first-stage prediction y2^2_hat.  If
instead we regress y2 on z at the first stage, giving y2_hat, and then
regress y1 on (y2_hat)^2, an inconsistent estimate is obtained.
Generate a simulation sample to demonstrate these points.  Consider
whether this example can be generalized to other nonlinear models
where the nonlinearity is in regressors only, so that y1 = g(y2)'beta
+ u, where g(y2) is a nonlinear function of y2 [y2 being a vector of
variables]." (Microeconometrics Using Stata, Cameron and Trivedi)

Here is my approach:

clear
set seed 10101
quietly set obs 10000
generate double z = 5*rnormal(0) /* instrument */
generate double x = 5*rnormal(0)
matrix C = (1, -0.5 \ -0.5, 1) /* correlation structure */
corr2data u v, corr(C) /* correlated errors */
generate double y2 = 3*z + v /* endogenous variable due to correlation
with error, u */
generate y2sq = y2^2
generate double y1 = 5 + 2*y2sq + x + u
reg y2sq z x
predict y2sq_hat, xb
reg y1 y2sq_hat x, robust

reg y2 z x
predict y2_hat, xb
generate y2_hat_sq = y2_hat^2
reg y1 y2_hat_sq x, robust

The coefficient estimates on both y2sq_hat and y2_hat_sq are near the
actual coefficient of 2, however, the standard error for the estimate
deemed inconsistent is remarkably small, yielding a t-value of 452,
while the standard error for the estimate deemed consistent is
slightly larger than the parameter estimate, yielding t-value of less
than one.

My generated data do not make it clear that one estimate is
inconsistent while the other is consistent.  Moreover, the one deemed
inconsistent is not only close to the actual coefficient, but its
standard error is quite small.  What am I doing wrong?

Thank you for your attention and help.

Misha
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
```