# st: Std. -ivregress- vs. "by hand" using -reg- with nonlinear endogenous variable

 From Misha Spisok <[email protected]> To [email protected] Subject st: Std. -ivregress- vs. "by hand" using -reg- with nonlinear endogenous variable Date Wed, 6 Jan 2010 12:11:24 -0800

```Hello, Statalist!

[There have been no replies to a previous message posted nine days ago.]

My questions, in short, are the following:
1. How is it "obvious" from my generated data and approach that one
estimator is consistent while the other is inconsistent?
2.  Why do the standard errors differ when doing IV "by hand" (i.e.,
using -reg- for each stage) versus using -ivregress-?

These questions are motivated by an exercise in Microeconometrics
Using Stata, by Cameron and Trivedi, Exercise 11 of Chapter 6 (page
204);  I give the exercise below, and follow it with my "solution" and
my questions.  Apologies in advance for the length of this message.

"When an endogenous variable enters the regression nonlinearly, the
obvious IV estimator is inconsistent and a modification is needed.
Specifically, suppose y1 = b*y2^2 + u, and the first-stage equation
for y2 is y2 = p*z + v, where the zero-mean errors u and v are
correlated.  Here the endogenous regressor appears in the structural
equation as y2^2 rather than y2.  The IV estimator  is b_hat_IV = (sum
z_i * y2_i^2)^(-1)*(sum z_i * y1_i).  This can be implemented by a
regular IV regression of y1 on y2^2 with the instrument z: regress
y2^2 on z and then regress y1 on the first-stage prediction y2^2_hat.
If instead we regress y2 on z at the first stage, giving y2_hat, and
then regress y1 on (y2_hat)^2, an inconsistent estimate is obtained.
Generate a simulation sample to demonstrate these points.  Consider
whether this example can be generalized to other nonlinear models
where the nonlinearity is in regressors only, so that y1 = g(y2)'beta
+ u, where g(y2) is a nonlinear function of y2 [y2 being a vector of
variables]." (Microeconometrics Using Stata, Cameron and Trivedi)

Here is my approach:

clear
set seed 10101
/*      Generate Data      */
quietly set obs 10000
generate double z = 5*rnormal(0) /* instrument */
generate double x = 5*rnormal(0)
matrix C = (1, -0.5 \ -0.5, 1) /* correlation structure */
corr2data u v, corr(C) /* correlated errors */
generate double y2 = 3*z + v /* endogenous variable due to correlation
with error, u */
generate y2sq = y2^2
generate double y1 = 5 + 2*y2sq + x + u

/*      Consistent Estimation      */
/*      First Stage Regression      */
reg y2sq z x
predict y2sq_hat, xb
/*      Second Stage Regression      */
reg y1 y2sq_hat x, robust
/*      Above with -ivregress-      */
ivregress 2sls y1 x (y2sq = z), vce(robust) first

/*      Inconsistent Estimation      */
/*      First Stage Regression      */
reg y2 z x
predict y2_hat, xb
generate y2_hat_sq = y2_hat^2
/*      Second Stage Regression      */
reg y1 y2_hat_sq x, robust

The coefficient estimates on both y2sq_hat and y2_hat_sq are near the
actual coefficient of 2, however, the standard error for the estimate
deemed inconsistent is remarkably small, yielding a t-value of 452,
while the standard error for the estimate deemed consistent is, in the
case of doing it "by hand," slightly larger than the parameter
estimate, yielding t-value of less than one, while the standard error
using -ivregress- is quite small, yielding a t-value of 1370.

My generated data do not make it clear that one estimate is
inconsistent while the other is consistent.  Moreover, the one deemed
inconsistent is not only close to the actual coefficient, but its
standard error is quite small.  What am I doing wrong?

Also, for the consistent estimation done "by hand," why do the
standard errors differ so greatly from those from -ivregress-?

Thank you for your attention and help.

Misha
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
```

• Follow-Ups: