# st: RE: Std. -ivregress- vs. "by hand" using -reg- with nonlinear endogenous variable

 From DE SOUZA Eric <[email protected]> To "'[email protected]'" <[email protected]> Subject st: RE: Std. -ivregress- vs. "by hand" using -reg- with nonlinear endogenous variable Date Wed, 6 Jan 2010 21:24:42 +0100

```Question 2: using OLS for the second stage does not take into account that you are using estimated variables instead of the original endogenous variables. For this reason, the correct standard errors are not those calculated under OLS

Question 1: See slide 53 onwards in the presentation by Christopher Baum:
http://www.stata.com/meeting/13uk/baumUKSUG2007.pdf

Eric de Souza

-----Original Message-----
From: [email protected] [mailto:[email protected]] On Behalf Of Misha Spisok
Sent: 06 January 2010 21:11
To: [email protected]
Subject: st: Std. -ivregress- vs. "by hand" using -reg- with nonlinear endogenous variable

Hello, Statalist!

[There have been no replies to a previous message posted nine days ago.]

My questions, in short, are the following:
1. How is it "obvious" from my generated data and approach that one estimator is consistent while the other is inconsistent?
2.  Why do the standard errors differ when doing IV "by hand" (i.e., using -reg- for each stage) versus using -ivregress-?

These questions are motivated by an exercise in Microeconometrics Using Stata, by Cameron and Trivedi, Exercise 11 of Chapter 6 (page 204);  I give the exercise below, and follow it with my "solution" and my questions.  Apologies in advance for the length of this message.

"When an endogenous variable enters the regression nonlinearly, the obvious IV estimator is inconsistent and a modification is needed.
Specifically, suppose y1 = b*y2^2 + u, and the first-stage equation for y2 is y2 = p*z + v, where the zero-mean errors u and v are correlated.  Here the endogenous regressor appears in the structural equation as y2^2 rather than y2.  The IV estimator  is b_hat_IV = (sum z_i * y2_i^2)^(-1)*(sum z_i * y1_i).  This can be implemented by a regular IV regression of y1 on y2^2 with the instrument z: regress
y2^2 on z and then regress y1 on the first-stage prediction y2^2_hat.
If instead we regress y2 on z at the first stage, giving y2_hat, and then regress y1 on (y2_hat)^2, an inconsistent estimate is obtained.
Generate a simulation sample to demonstrate these points.  Consider whether this example can be generalized to other nonlinear models where the nonlinearity is in regressors only, so that y1 = g(y2)'beta
+ u, where g(y2) is a nonlinear function of y2 [y2 being a vector of
variables]." (Microeconometrics Using Stata, Cameron and Trivedi)

Here is my approach:

clear
set seed 10101
/*      Generate Data      */
quietly set obs 10000
generate double z = 5*rnormal(0) /* instrument */ generate double x = 5*rnormal(0) matrix C = (1, -0.5 \ -0.5, 1) /* correlation structure */ corr2data u v, corr(C) /* correlated errors */ generate double y2 = 3*z + v /* endogenous variable due to correlation with error, u */ generate y2sq = y2^2 generate double y1 = 5 + 2*y2sq + x + u

/*      Consistent Estimation      */
/*      First Stage Regression      */
reg y2sq z x
predict y2sq_hat, xb
/*      Second Stage Regression      */
reg y1 y2sq_hat x, robust
/*      Above with -ivregress-      */
ivregress 2sls y1 x (y2sq = z), vce(robust) first

/*      Inconsistent Estimation      */
/*      First Stage Regression      */
reg y2 z x
predict y2_hat, xb
generate y2_hat_sq = y2_hat^2
/*      Second Stage Regression      */
reg y1 y2_hat_sq x, robust

The coefficient estimates on both y2sq_hat and y2_hat_sq are near the actual coefficient of 2, however, the standard error for the estimate deemed inconsistent is remarkably small, yielding a t-value of 452, while the standard error for the estimate deemed consistent is, in the case of doing it "by hand," slightly larger than the parameter estimate, yielding t-value of less than one, while the standard error using -ivregress- is quite small, yielding a t-value of 1370.

My generated data do not make it clear that one estimate is inconsistent while the other is consistent.  Moreover, the one deemed inconsistent is not only close to the actual coefficient, but its standard error is quite small.  What am I doing wrong?

Also, for the consistent estimation done "by hand," why do the standard errors differ so greatly from those from -ivregress-?

Thank you for your attention and help.

Misha
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
```

• Follow-Ups:
• References: