Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

# Re: RE: st: Tests of overidentifying restrictions with -ivregress-

 From Alfonso S <[email protected]> To "[email protected]" <[email protected]> Subject Re: RE: st: Tests of overidentifying restrictions with -ivregress- Date Mon, 14 Oct 2013 08:11:25 -0700 (PDT)

```Roberto,

sorry it took me so long to get back to you but I must have misplaced this conversation in my email and couldn't find it. Mark's explanation is very interesting and technically plausible. I wanted to clarify what I meant by including all your exogenous variables as IVs. Even though Stata does use all exogenous variables in the first stage to predict the endogenous variables, the results differ in the second stage. To illustrate I have used mus06data.dta that is used in Chapter 6 of Cameron and Trivedi's Microeconometrics Using Stata 1st Edition, and that you can find in the Stata Press Book section. I first run the 2sls model with only ssiratio being specified as IV and multlc as an exogenous variable. These are the results:

------ begin code ------
. ivregress 2sls ldrugexp (hi_empunion = ssiratio) multlc \$x2list, vce(robust) first

First-stage regressions
-----------------------

Number of obs   =      10089
F(   7,  10081) =     113.96
Prob > F        =     0.0000
R-squared       =     0.0794
Adj R-squared   =     0.0788
Root MSE        =     0.4664

------------------------------------------------------------------------------
|               Robust
hi_empunion |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
multlc |   .1209113    .020779     5.82   0.000     .0801804    .1616422
totchr |   .0132573   .0036603     3.62   0.000     .0060825    .0204322
age |  -.0080531   .0007123   -11.31   0.000    -.0094493   -.0066569
female |  -.0727472   .0096209    -7.56   0.000    -.0916061   -.0538883
blhisp |     -.0679   .0122426    -5.55   0.000    -.0918979   -.0439021
linc |   .0444476   .0065546     6.78   0.000     .0315993    .0572959
ssiratio |  -.1823381   .0232885    -7.83   0.000    -.2279882   -.1366879
_cons |   .9834068   .0586275    16.77   0.000     .8684852    1.098328
------------------------------------------------------------------------------

Instrumental variables (2SLS) regression               Number of obs =   10089
Wald chi2(7)  = 2019.35
Prob > chi2   =  0.0000
R-squared     =  0.0706
Root MSE      =   1.313

------------------------------------------------------------------------------
|               Robust
ldrugexp |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
hi_empunion |  -.8691073   .2328782    -3.73   0.000     -1.32554   -.4126743
multlc |  -.0709315    .068005    -1.04   0.297     -.204219    .0623559
totchr |   .4496251   .0102185    44.00   0.000     .4295973    .4696529
age |  -.0133115   .0029609    -4.50   0.000    -.0191149   -.0075082
female |  -.0187261     .03289    -0.57   0.569    -.0831894    .0457372
blhisp |  -.2125767   .0402796    -5.28   0.000    -.2915232   -.1336302
linc |   .0879383    .022285     3.95   0.000     .0442606     .131616
_cons |   6.784596   .2686385    25.26   0.000     6.258074    7.311118
------------------------------------------------------------------------------
Instrumented:  hi_empunion
Instruments:   multlc totchr age female blhisp linc ssiratio
------ end code -------

I then included multlc in the parentheses to indicate that it is an instrument for hi_empunion. The results for the first stage are the same (since it uses all the regressors to predict hi_empunion, but not on the second stage, since it no longer uses multlc as an explanatory variable in that stage, merely as an instrument. These are the results from that:

------ begin code --------
. ivregress 2sls ldrugexp (hi_empunion = ssiratio multlc) \$x2list, vce(robust) first

First-stage regressions
-----------------------

Number of obs   =      10089
F(   7,  10081) =     113.96
Prob > F        =     0.0000
R-squared       =     0.0794
Adj R-squared   =     0.0788
Root MSE        =     0.4664

------------------------------------------------------------------------------
|               Robust
hi_empunion |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
totchr |   .0132573   .0036603     3.62   0.000     .0060825    .0204322
age |  -.0080531   .0007123   -11.31   0.000    -.0094493   -.0066569
female |  -.0727472   .0096209    -7.56   0.000    -.0916061   -.0538883
blhisp |     -.0679   .0122426    -5.55   0.000    -.0918979   -.0439021
linc |   .0444476   .0065546     6.78   0.000     .0315993    .0572959
ssiratio |  -.1823381   .0232885    -7.83   0.000    -.2279882   -.1366879
multlc |   .1209113    .020779     5.82   0.000     .0801804    .1616422
_cons |   .9834068   .0586275    16.77   0.000     .8684852    1.098328
------------------------------------------------------------------------------

Instrumental variables (2SLS) regression               Number of obs =   10089
Wald chi2(6)  = 1955.36
Prob > chi2   =  0.0000
R-squared     =  0.0414
Root MSE      =  1.3335

------------------------------------------------------------------------------
|               Robust
ldrugexp |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
hi_empunion |  -.9899269   .2045907    -4.84   0.000    -1.390917   -.5889365
totchr |   .4512051   .0103088    43.77   0.000     .4310001      .47141
age |  -.0141384      .0029    -4.88   0.000    -.0198223   -.0084546
female |  -.0278398   .0321743    -0.87   0.387    -.0909002    .0352207
blhisp |  -.2237087   .0395848    -5.65   0.000    -.3012934   -.1461239
linc |   .0942748   .0218841     4.31   0.000     .0513828    .1371668
_cons |   6.875188   .2578855    26.66   0.000     6.369741    7.380634
------------------------------------------------------------------------------
Instrumented:  hi_empunion
Instruments:   totchr age female blhisp linc ssiratio multlc
------ end code -------

Now, you were explaining how it doesn't make any sense for your exo4 to be endogenous. Can you please explain a little more about your dependent variable and exo4?

Thanks and I apologize again for the tardiness.

Alfonso

On Monday, October 14, 2013 9:20 AM, "Schaffer, Mark E" <[email protected]> wrote:

Roberto,

You write:

> The error term
> of the second equation should be equal to the error term of the first one,
> minus the effect of exo4; if the instruments were not correlated with the
> first error, how can be they correlated with the second one?

It's possible.  Let me give you a mechanical example.  "Mechanical" means it's an illustration, and there's no economic meaning intended.

Call the error term in the original model e1.  The variable exo4 is not in the model, so it's "inside" e1.

Call the error term in the augmented model e2.  The variable exo4 is one of the regressors in the augmented model, so it's not "inside" e2.

Say that the "true" coefficient on exo4 in the augmented model is 1, just to make the exposition easier.  So a feature of the DGP is that e1 = e2 + exo4.

Say that Z is a valid instrument for the original model, so E(Z*e1)=0.

Is it possible that Z is not a valid instrument for the new model?  That is, is it possible that E(Z*e2) is not zero?  I think so.

The way that I've set it up, the orthogonality condition for the original model means E(Z*e1) = E(Z*e2) + E(Z*exo4) = 0.  E(Z*e2) can be nonzero, and as long as E(Z*exo4) is the opposite sign and the same magnitude, then the orthogonality condition for the original model, E(Z*e1)=0, will be satisfied, but the orthogonality condition for the augmented model, E(Z*e2)=0, will fail.

Writing this on the fly so caveat emptor, but I think I got that right....

--Mark

> -----Original Message-----
> From: [email protected] [mailto:owner-
> [email protected]] On Behalf Of Roberto Pannico
> Sent: 14 October 2013 11:30
> To: [email protected]
> Subject: Re: RE: st: Tests of overidentifying restrictions with -ivregress-
>
> Dear Mark,
> thank you very much for your help and for your useful explanation. Actually I
> have good reasons for thinking that exo4 is endogenous to the model
> because of an omitted variable. What I don't understand is why the
> endogeneity of exo4 should cause the invalidity of my instrumental variables.
> I will try to explain myself in a better way.
> My model is the following:
>
> ivregress 2sls dep (endo endoXexo = instrument1 instrument2
> instrument1#exo instrument2#exo) exo exo1 exo2 exo3, first
>
> where dep is the dependent variable, endo is the endogenous regressor,
> exo is an exogenous regressor that I want to interact with the endogenous
> one, and exo1, exo2, exo3 are other exogenous regressors.
> After running this model, I type -estat overid- and I obtain this result:
>
>
> Tests of overidentifying restrictions:
>
>   Sargan (score) chi2(2) =  .311939  (p = 0.8556)
>   Basmann chi2(2)        =  .310601  (p = 0.8562)
>
> As far as I understand, this test means that my instruments are valid because
> are not correlated with the error term ( and therefore they are not
> correlated with the omitted variables that are included in it). Now, I want to
> add an other exogenous variable in my main regression, and for this reason I
> write:
>
> ivregress 2sls dep (endo endoXexo = instrument1 instrument2
> instrument1#exo instrument2#exo) exo exo1 exo2 exo3 exo4, first
>
> where exo4 is the new variable that I add to the model. The effect of this
> new factor on the dependent variable is statistically significant, and it also
> considerably  reduces the effect of endo, meaning that its effect was
> included in the error term of the previous regression. However, when I type
> again -estat overid-  the result is the following:
>
>  Tests of overidentifying restrictions:
>
>   Sargan (score) chi2(2) =  14.1205  (p = 0.0009)
>   Basmann chi2(2)        =  14.0913  (p = 0.0009)
>
> so, in this case my instruments are not valid anymore, they are correlated
> with the error term. I understand that exo4 can be endogenous to the model
> and for this reason correlated with the error term, but why this should also
> cause the instruments being correlated with the error term? The error term
> of the second equation should be equal to the error term of the first one,
> minus the effect of exo4; if the instruments were not correlated with the
> first error, how can be they correlated with the second one?
>
> I apologize if I am missing a very obvious point...
> Thank you very much for your help
> Roberto
>
>
>
> Roberto Pannico
> PhD Candidate
> Department of Political Science
> Universitat Autònoma de Barcelona (UAB)
> Edifici B, 08193 Bellaterra, Barcelona, Spain
> Office: B3b/119.1
> Tel. (+34) 93 581 49 73
> [email protected]
>
>
> ----- Mensaje original -----
> De: "Schaffer, Mark E" <[email protected]>
> Fecha: Jueves, Octubre 10, 2013 12:02 pm
> Asunto: RE: st: Tests of overidentifying restrictions with -ivregress-
>
> > Roberto,
> >
> > > -----Original Message-----
> > > From: [email protected] [owner-
> > > [email protected]] On Behalf Of Roberto Pannico
> > > Sent: 09 October 2013 17:15
> > > To: [email protected]
> > > Cc: [email protected]
> > > Subject: Re: st: Tests of overidentifying restrictions with -
> > ivregress-
> > >
> > > Hola Alfonso,
> > > thank you very much for your answer.
> > > Actually I have done an endogeneity test of exo4 and this is the
> > result:>
> > > Tests of endogeneity
> > >   Ho: variables are exogenous
> > >
> > >   Durbin (score) chi2(1)          =  13.8016  (p = 0.0002)
> > >   Wu-Hausman F(1,5731)            =  13.7747  (p = 0.0002)
> > >
> > > So, it seems that technically the variable is endogenous. The
> > "problem" is
> > > that theoretically this is impossible: exo4 is the amount of
> > money that a
> > > country receives from European Union, while the dependent
> > variable of the
> > > model is the level of support that a citizen give to European
> > Union. And given
> > > that the amount of money that a country receives is not
> > determined taking
> > > into account the level of support of its citizens (but the
> > opposite is true),
> > > theoretically the regressor can not be endogenous.
> >
> > I am afraid this is a fundamental misunderstanding of what
> > "endogeneity" and "exogeneity" means in the context of econometrics
> > and Sargan/Hansen/Durbin/Wu/Hausman tests.
> >
> > You have in mind "determined within the system" vs. "determined
> > outside the system", or something like that.  These are perfectly
> > legitimate definitions of endogenous and exogenous.  But that's not
> > what these tests are testing.
> >
> > In econometrics, "exogenous" means E(Xu)=0.  (You can make it a
> > conditional expectation, you can distinguish between strong and weak
> > exogeneity, etc., it doesn't affect the main point.)  It's easy to
> > think of examples where X is a regressor that is "exogenous" in the
> > way you are using the term ("determined outside the system") but
> > endogenous in the sense that E(Xu) ≠ 0.
> >
> > Here's an example.  We have a dataset of farms.  X is weather.
> > It's easy to see that weather is exogenous in the sense that you are
> > using the term - it's determined outside the system, like exo4 in your
> > example.  But it's also easy to see that it can be endogenous in an
> > econometric sense, i.e., E(Xu) is not zero.  The orthogonality
> > condition E(Xu)=0 would fail if there are omitted variables in u which
> > are correlated with weather (like, I don't know, soil quality - I
> > confess I know very little about practical farming - it's just an
> > example).  This makes weather "endogenous"
> > in the econometric sense, even though for most practical purposes
> > (climate change, cloud seeding et al. aside) it's exogenous in a
> > modelling or system sense.
> >
> > Note that whether or not a regressor is econometrically exogenous
> > depends on the specification of the model (or, if you prefer, what's
> > in u because it's not in the model).  You may be able to come up with
> > a different specification of your model where you have good reasons to
> > think that exo4 is exogenous in the econometric sense.
> >
> > HTH,
> > Mark
> >
> >
> > > Concerning your second questions, when I write
> > >
> > > ivregress 2sls dep (endo endoXexo = instrument1 instrument2
> > > instrument1#exo instrument2#exo) exo exo1 exo2 exo3 exo4, first
> > >
> > > the command -ivregress- automatically uses all the regressors of
> > the model
> > > as instrumental variables.
> > > Finally, I am not sure I understand your last question. Why
> > should I use the
> > > instruments as explanatory variables in the main model? in any
> > case Stata
> > > does not allow me doing it. When I write the following model:
> > >
> > > ivregress 2sls dep (endo endoXexo = instrument1 instrument2
> > > instrument1#exo instrument2#exo) exo exo1 exo2 exo3 exo4
> instrument1
> > > instrument2 instrument1#exo instrument2#eco, first
> > >
> > > Stata gives the following error message
> > >
> > > equation not identified; must have at least as many instruments
> > not in the
> > > regression as there are instrumented variables
> > >
> > > Any other suggestion?
> > > Thank you again for your help
> > > Roberto
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > > ----- Mensaje original -----
> > > De: Alfonso S <[email protected]>
> > > Fecha: Miércoles, Octubre 9, 2013 3:45 pm
> > > Asunto: Re: st: Tests of overidentifying restrictions with -
> > ivregress-
> > >
> > > > Hola Roberto,
> > > >
> > > > my first thought is that exo4 may not be exogenous. Have you
> > done a
> > > > test of endogeneity? My second question would also be why don't
> > you> > use all the exogenous variables you have as instruments, and
> > the> > instruments you are using as explanatory variables as well?
> > > >
> > > > Best,
> > > >
> > > > Alfonso Sanchez-Penalver
> > > >
> > > >
> > > >
> > > > On Wednesday, October 9, 2013 7:47 AM, Roberto Pannico
> > > > <[email protected]> wrote:
> > > > Dear all,
> > > > I need your help for interpreting some postestimation results
> > of my
> > > > instrumental variables model. I am using Stata 12.0 and the
> > command> > -ivregress-. The sintax is the following:
> > > >
> > > > ivregress 2sls dep (endo endoXexo = instrument1 instrument2
> > > > instrument1#exo instrument2#exo) exo exo1 exo2 exo3, first
> > > >
> > > > where dep is the dependent variable, endo is the endogenous
> > regressor,> > exo is an exogenous regressor that I want to interact
> > with the
> > > > endogenous one, and exo1, exo2, exo3 are other exogenous
> > regressors.> > After running this model I type -estat overid- and I
> > obtain this
> > > > result:
> > > >
> > > > Tests of overidentifying restrictions:
> > > >
> > > >   Sargan (score) chi2(2) =  .311939  (p = 0.8556)
> > > >   Basmann chi2(2)        =  .310601  (p = 0.8562)
> > > >
> > > >
> > > > This should mean that my instruments are not correlated with
> > the error
> > > > of the main regression and therefore they are valid. Now, I
> > want to
> > > > add an other exogenous regressor in the main regression, and
> > for this
> > > > reason I write:
> > > >
> > > > ivregress 2sls dep (endo endoXexo = instrument1 instrument2
> > > > instrument1#exo instrument2#exo) exo exo1 exo2 exo3 exo4, first
> > > >
> > > > where exo4 is the new variable that I add to the model. The
> > effect of
> > > > this new factor on the dependent variable is statistically
> > > > significant, and it also considerably  reduces the effect of endo.
> > > > However, when I type again -estat overid-  the result is the
> > > > following:
> > > > Tests of overidentifying restrictions:
> > > >
> > > >   Sargan (score) chi2(2) =  14.1205  (p = 0.0009)
> > > >   Basmann chi2(2)        =  14.0913  (p = 0.0009)
> > > >
> > > >
> > > > This means that my instruments are not valid anymore. How it
> > can be
> > > > possible? The error term of the first model should incorporate
> > also> > the effect of exo4. As far as I am aware, if my instruments
> > are not
> > > > correlated to it (the error term), they can not be correlated
> > with the
> > > > error term of the second model. I don't know how to interpret
> > these> > results.....
> > > > Any idea or suggestion?
> > > > Thank you very much for help
> > > > Roberto
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > > *
> > > > *   For searches and help try:
> > > > *  http://www.stata.com/help.cgi?search
> > > > *  http://www.stata.com/support/faqs/resources/statalist-faq/
> > > > *  http://www.ats.ucla.edu/stat/stata/
> > > >
> > > >
> > > > *
> > > > *   For searches and help try:
> > > > *  http://www.stata.com/help.cgi?search
> > > > *  http://www.stata.com/support/faqs/resources/statalist-faq/
> > > > *  http://www.ats.ucla.edu/stat/stata/
> > > >
> > >
> > >
> > > *
> > > *   For searches and help try:
> > > *  http://www.stata.com/help.cgi?search
> > > *  http://www.stata.com/support/faqs/resources/statalist-faq/
> > > *  http://www.ats.ucla.edu/stat/stata/
> >
> >
> > -----
> > Sunday Times Scottish University of the Year 2011-2013 Top in the UK
> > for student experience Fourth university in the UK and top in Scotland
> > (National Student Survey 2012)
> >
> >
> > We invite research leaders and ambitious early career researchers to
> > join us in leading and driving research in key inter-disciplinary
> > themes.
> > Please see www.hw.ac.uk/researchleaders for further information and
> > howto apply.
> >
> > Heriot-Watt University is a Scottish charity registered under charity
> > number SC000278.

> >
> >
> > *
> > *   For searches and help try:
> > *  http://www.stata.com/help.cgi?search
> > *  http://www.stata.com/support/faqs/resources/statalist-faq/
> > *  http://www.ats.ucla.edu/stat/stata/
> >
>
>
> *
> *   For searches and help try:
> *  http://www.stata.com/help.cgi?search
> *  http://www.stata.com/support/faqs/resources/statalist-faq/
> *  http://www.ats.ucla.edu/stat/stata/

-----
Sunday Times Scottish University of the Year 2011-2013
Top in the UK for student experience
Fourth university in the UK and top in Scotland (National Student Survey 2012)

We invite research leaders and ambitious early career researchers to
join us in leading and driving research in key inter-disciplinary themes.
Please see www.hw.ac.uk/researchleaders for further information and how
to apply.

Heriot-Watt University is a Scottish charity
registered under charity number SC000278.

*
*   For searches and help try:
*  http://www.stata.com/help.cgi?search
*  http://www.stata.com/support/faqs/resources/statalist-faq/
*  http://www.ats.ucla.edu/stat/stata/

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/
```