Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

# st: ivreg2 - weak instruments?

 From Phil To statalist@hsphsun2.harvard.edu Subject st: ivreg2 - weak instruments? Date Thu, 29 Apr 2010 13:22:38 +0200

```Dear statalist,

I estimate two model specifications where I suspect a RHS variable to
be endogenous because of reserve-causality. I test this assumption by
instrumental variable estimations, where I use two excluded
instruments for the potentially endogenous variable. Both model
specifications differ in the way that in model 1 the potentially
endogenous variable enters the equation only once while in model 2 it
enters the equation also as squared and interaction term.

Model 1 looks as follows:
Y = b bx1 + bx2 + bz + u
Where x1 is the potentially endogenous variable, x2 is an exogenous
variable und z are the instruments.
I estimate this model with ivreg2 and tests (F-test, partial
R-squared, Kleibergen-Paap, Hansen’s J, endog option) indicate that
the instruments are valid and necessary (see below for output).

Model 2 is the following:
Y = b bx1 + bx2 + bx1^2 + bx1*bx2 + bz + u
In this model there are three potentially endogenous variables because
of a squared and an interaction term of the potentially endogenous
variable x1 (bx1, bx1^2, bx1*bx2). I calculated the instruments
accordingly, i.e. taking their squares and interacting them with x2. I
found that including all six of these instruments renders the
overidentification test to fail. However, when I include four of the
instruments, the overidentification test holds. In this case Shea’s
partial R-squared for the three potentially endogenous variables
ranges from 0.3, over 0.6 to 0.9 (see output below). What confuses me
is that the identification test (Kleibergen-Paap) fails to hold
dramatically in the second model showing P-values of 1.0 while the
other tests for the instruments look okay. Moreover, these instruments
worked well in model 1 with only one endogenous variable. How should I
interpret these results, is the Kleibergen-Paap test valid with
multiple endogenous variables and does it mean that the instruments
are weak in model 2?

I would appreciate any help in this matter.
Best
Phil

Model 1
. ivreg2 fas3 sizei sizesqi skilli4 sizei_skill4i invcj tcj2
skill4sqi_tcj2 tci2 dist argentina australia austria belgium
>  brazil bulgaria canada chile chinamainland  czechrepublic denmark finland france germany greece hongkong hungary irela
> nd italy korea malaysia mexico netherlands newzealand norway philippines poland portugal romania russia singapore    sp
> ain sweden switzerland taiwan thailand turkey usa argentina2 australia2 austria2 belgium2 brazil2 bulgaria2 canada2 chi
> le2 chinamainland2  czechrepublic2 denmark2 finland2 france2 germany2 greece2 hongkong2 hungary2 ireland2 italy2 korea2
>  malaysia2 mexico2 netherlands2 newzealand2 norway2 philippines2 poland2 portugal2 romania2 russia2 singapore2 spain2 s
> weden2 switzerland2 taiwan2 thailand2 turkey2 usa2 _2001 _2002 _2003 _2004 _2005 (sumgdp = inst sumlat2), robust first
> endogtest(sumgdp)

First-stage regressions
-----------------------
First-stage regression of sumgdp:
OLS estimation
--------------
Estimates efficient for homoskedasticity only
Statistics robust to heteroskedasticity
Number of obs =     4366
F( 92,  4273) = 57315.23
Prob > F      =   0.0000
Total (centered) SS     =  3.93926e+16                Centered R2   =   0.9996
Total (uncentered) SS   =  5.88246e+16                Uncentered R2 =   0.9997
Residual SS             =  1.75151e+13                Root MSE      =    64024

------------------------------------------------------------------------------
|               Robust
sumgdp |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
sizei |   96125.36   25162.36     3.82   0.000     46794.06    145456.7
sizesqi |  -66612.46   10634.54    -6.26   0.000    -87461.68   -45763.23
skilli4 |   14291.49   7066.856     2.02   0.043      436.784     28146.2
sizei_ski~4i |    13911.5    15459.9     0.90   0.368    -16397.94    44220.95
invcj |   -27860.7   5149.074    -5.41   0.000    -37955.56   -17765.84
tcj2 |  -26531.13   2912.961    -9.11   0.000    -32242.04   -20820.21
skill4sqi_~2 |  -119.5441   137.8151    -0.87   0.386    -389.7333    150.6452
tci2 |  -37091.49    2328.54   -15.93   0.000    -41656.64   -32526.35
dist |   6.129196    1.03451     5.92   0.000     4.101018    8.157373
….
inst |   .8927053   .0091796    97.25   0.000     .8747086     .910702
sumlat2 |      16156   939.7222    17.19   0.000     14313.66    17998.34
_cons |  -885762.3   86184.74   -10.28   0.000     -1054729   -716795.4
------------------------------------------------------------------------------
Included instruments: sizei sizesqi skilli4 sizei_skill4i invcj tcj2
------------------------------------------------------------------------------
Partial R-squared of excluded instruments:   0.9370
Test of excluded instruments:
F(  2,  4273) = 10904.41
Prob > F      =   0.0000

Summary results for first-stage regressions
-------------------------------------------

Variable    | Shea Partial R2 |   Partial R2    |  F(  2,  4273)    P-value
sumgdp      |     0.9370      |     0.9370      |    10904.41       0.0000

NB: first-stage F-stat heteroskedasticity-robust

Underidentification tests
Ho: matrix of reduced form coefficients has rank=K1-1 (underidentified)
Ha: matrix has rank=K1 (identified)
Kleibergen-Paap rk LM statistic             Chi-sq(2)=1019.26  P-val=0.0000
Kleibergen-Paap rk Wald statistic           Chi-sq(2)=22283.48 P-val=0.0000

Weak identification test
Ho: equation is weakly identified
Kleibergen-Paap Wald rk F statistic             10904.41
See main output for Cragg-Donald weak id test critical values

Weak-instrument-robust inference
Tests of joint significance of endogenous regressors B1 in main equation
Ho: B1=0 and overidentifying restrictions are valid
Anderson-Rubin Wald test     F(2,4273)=21.41     P-val=0.0000
Anderson-Rubin Wald test     Chi-sq(2)=43.74     P-val=0.0000
Stock-Wright LM S statistic  Chi-sq(2)=39.38     P-val=0.0000

NB: Underidentification, weak identification and weak-identification-robust
test statistics heteroskedasticity-robust

Number of observations               N  =       4366
Number of regressors                 K  =         92
Number of instruments                L  =         93
Number of excluded instruments       L1 =          2

IV (2SLS) estimation
--------------------
Estimates efficient for homoskedasticity only
Statistics robust to heteroskedasticity
Number of obs =     4366
F( 91,  4274) =    13.81
Prob > F      =   0.0000
Total (centered) SS     =  2.40966e+12                Centered R2   =   0.4117
Total (uncentered) SS   =  2.64194e+12                Uncentered R2 =   0.4635
Residual SS             =  1.41751e+12                Root MSE      =    18019

------------------------------------------------------------------------------
|               Robust
fas3 |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
sumgdp |   .0149349   .0023039     6.48   0.000     .0104194    .0194505
sizei |   69406.44   7477.315     9.28   0.000     54751.17    84061.71
sizesqi |  -35124.18   3799.468    -9.24   0.000       -42571   -27677.36
skilli4 |   23967.22   2744.369     8.73   0.000     18588.36    29346.09
sizei_ski~4i |  -26711.56    3870.27    -6.90   0.000    -34297.16   -19125.97
invcj |   -2308.65   1534.638    -1.50   0.132    -5316.484    699.1849
tcj2 |   1914.868   618.6698     3.10   0.002     702.2977    3127.439
skill4sqi_~2 |  -306.3677   46.95296    -6.52   0.000    -398.3938   -214.3416
tci2 |  -1620.602    611.842    -2.65   0.008     -2819.79   -421.4134
dist |  -2.040303   .2484136    -8.21   0.000    -2.527184   -1.553421
….
_cons |   -21100.4    8803.58    -2.40   0.017     -38355.1   -3845.697
------------------------------------------------------------------------------
Underidentification test (Kleibergen-Paap rk LM statistic):           1019.262
Chi-sq(2) P-val =    0.0000
------------------------------------------------------------------------------
Weak identification test (Kleibergen-Paap rk Wald F statistic):        1.1e+04
Stock-Yogo weak ID test critical values: 10% maximal IV size             19.93
15% maximal IV size             11.59
20% maximal IV size              8.75
25% maximal IV size              7.25
Source: Stock-Yogo (2005).  Reproduced by permission.
NB: Critical values are for Cragg-Donald F statistic and i.i.d. errors.
------------------------------------------------------------------------------
Hansen J statistic (overidentification test of all instruments):         0.014
Chi-sq(1) P-val =    0.9046
-endog- option:
Endogeneity test of endogenous regressors:                               7.962
Chi-sq(1) P-val =    0.0048
Regressors tested:    sumgdp
------------------------------------------------------------------------------
------------------------------------------------------------------------------

Model 2

. ivreg2 fas3 skdiff4 invcj tcj2 tcj2xskdiffsq4 tci2 dist argentina
australia austria belgium brazil bulgaria canada chil
> e chinamainland czechrepublic denmark finland france germany greece hongkong hungary ireland italy korea malaysia mexic
> o netherlands newzealand norway philippines poland portugal romania russia singapore    spain sweden switzerland taiwan
>  thailand turkey usa argentina2 australia2 austria2 belgium2 brazil2 bulgaria2 canada2 chile2 chinamainland2  czechrepu
> blic2 denmark2 finland2 france2 germany2 greece2 hongkong2 hungary2 ireland2 italy2 korea2 malaysia2 mexico2 netherland
> s2 newzealand2 norway2 philippines2 poland2 portugal2 romania2 russia2 singapore2 spain2 sweden2 switzerland2 taiwan2 t
> hailand2 turkey2 usa2 _2001 _2002 _2003 _2004 _2005 (sumgdp gdpdiffsq gdpdiffxskdiff4 = sumlat2 latdiffsq2 instdiffsq i
> nstdiffxskdiff4), endog(sumgdp gdpdiffsq gdpdiffxskdiff4) first robust

First-stage regressions
-----------------------
First-stage regression of sumgdp:
OLS estimation
--------------
Estimates efficient for homoskedasticity only
Statistics robust to heteroskedasticity

Number of obs =     4366
F( 91,  4274) =   713.56
Prob > F      =   0.0000
Total (centered) SS     =  3.93926e+16                Centered R2   =   0.9957
Total (uncentered) SS   =  5.88246e+16                Uncentered R2 =   0.9971
Residual SS             =  1.67917e+14                Root MSE      =   198212
------------------------------------------------------------------------------
|               Robust
sumgdp |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
skdiff4 |   168531.2   55878.43     3.02   0.003     58980.48      278082
invcj |  -127174.4   16629.73    -7.65   0.000    -159777.3   -94571.53
tcj2 |  -116866.4   8854.973   -13.20   0.000    -134226.7   -99506.05
tcj2xskdif~4 |    71164.2   11107.05     6.41   0.000     49388.61    92939.78
tci2 |  -170123.4   6870.852   -24.76   0.000    -183593.8   -156652.9
dist |   28.42898    2.64817    10.74   0.000     23.23719    33.62077
argentina |   401804.7   42340.53     9.49   0.000     318795.3    484814.1
….
sumlat2 |   73606.66   2152.339    34.20   0.000     69386.96    77826.36
latdiffsq2 |   71.83413   17.64345     4.07   0.000      37.2438    106.4245
instdiffsq |   2.30e-08   1.14e-09    20.09   0.000     2.08e-08    2.52e-08
instdiffxs~4 |    .003527   .0080782     0.44   0.662    -.0123104    .0193645
_cons |   -4154066   232882.3   -17.84   0.000     -4610636    -3697496
------------------------------------------------------------------------------
------------------------------------------------------------------------------
Partial R-squared of excluded instruments:   0.3986
Test of excluded instruments:
F(  4,  4274) =   498.38
Prob > F      =   0.0000

First-stage regression of gdpdiffsq:
OLS estimation
--------------
Estimates efficient for homoskedasticity only
Statistics robust to heteroskedasticity

Number of obs =     4366
F( 91,  4274) = 25105.65
Prob > F      =   0.0000
Total (centered) SS     =  3.90010e+30                Centered R2   =   0.9987
Total (uncentered) SS   =  4.41189e+30                Uncentered R2 =   0.9989
Residual SS             =  5.06592e+27                Root MSE      =  1.1e+12

------------------------------------------------------------------------------
|               Robust
gdpdiffsq |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
skdiff4 |  -1.13e+11   2.27e+11    -0.50   0.618    -5.58e+11    3.32e+11
invcj |   1.92e+11   8.28e+10     2.32   0.020     3.01e+10    3.55e+11
tcj2 |  -3.89e+10   4.74e+10    -0.82   0.411    -1.32e+11    5.39e+10
tcj2xskdif~4 |   1.72e+11   6.46e+10     2.66   0.008     4.54e+10    2.99e+11
tci2 |   8.78e+10   2.97e+10     2.95   0.003     2.95e+10    1.46e+11
dist |   3.34e+07   1.75e+07     1.90   0.057    -983252.2    6.78e+07
…..
sumlat2 |   4.69e+09   1.12e+10     0.42   0.675    -1.73e+10    2.66e+10
latdiffsq2 |  -6.41e+08   1.46e+08    -4.38   0.000    -9.27e+08   -3.54e+08
instdiffsq |   .9843647   .0128443    76.64   0.000     .9591831    1.009546
instdiffxs~4 |  -183051.9   109839.8    -1.67   0.096      -398395    32291.22
_cons |  -5.67e+11   1.22e+12    -0.46   0.643    -2.97e+12    1.83e+12
------------------------------------------------------------------------------
------------------------------------------------------------------------------
Partial R-squared of excluded instruments:   0.9246
Test of excluded instruments:
F(  4,  4274) =  1629.39
Prob > F      =   0.0000

First-stage regression of gdpdiffxskdiff4:

OLS estimation
--------------
Estimates efficient for homoskedasticity only
Statistics robust to heteroskedasticity

Number of obs =     4366
F( 91,  4274) =  8252.74
Prob > F      =   0.0000
Total (centered) SS     =  2.08810e+15                Centered R2   =   0.9985
Total (uncentered) SS   =  2.12589e+15                Uncentered R2 =   0.9986
Residual SS             =  3.05358e+12                Root MSE      =    26729

------------------------------------------------------------------------------
|               Robust
gdpdiffxsk~4 |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
skdiff4 |   11840.13   6163.584     1.92   0.055    -243.6901    23923.96
invcj |  -5915.838   1844.225    -3.21   0.001    -9531.476   -2300.201
tcj2 |   7376.204   1419.659     5.20   0.000     4592.936    10159.47
tcj2xskdif~4 |  -23616.51   2753.272    -8.58   0.000    -29014.35   -18218.66
tci2 |   2605.138   796.9941     3.27   0.001     1042.616     4167.66
dist |   .1873208   .4076901     0.46   0.646    -.6119634     .986605
….
sumlat2 |    78.7583   249.6652     0.32   0.752    -410.7151    568.2317
latdiffsq2 |  -2.176758   3.766132    -0.58   0.563    -9.560333    5.206816
instdiffsq |  -3.19e-10   2.61e-10    -1.22   0.222    -8.31e-10    1.93e-10
instdiffxs~4 |   1.151003    .002057   559.55   0.000      1.14697    1.155036
_cons |  -8679.519    27732.8    -0.31   0.754     -63050.2    45691.16
------------------------------------------------------------------------------
------------------------------------------------------------------------------
Partial R-squared of excluded instruments:   0.9982
Test of excluded instruments:
F(  3,  4274) =  1.1e+05
Prob > F      =   0.0000

Summary results for first-stage regressions
-------------------------------------------

Variable    | Shea Partial R2 |   Partial R2    |  F(  4,  4274)    P-value
sumgdp      |     0.2974      |     0.3986      |      498.38       0.0000
gdpdiffsq   |     0.6958      |     0.9246      |     1629.39       0.0000
gdpdiffxskdi|     0.9874      |     0.9982      |     1.1e+05       0.0000

NB: first-stage F-stat heteroskedasticity-robust

Underidentification tests
Ho: matrix of reduced form coefficients has rank=K1-1 (underidentified)
Ha: matrix has rank=K1 (identified)
Kleibergen-Paap rk LM statistic             Chi-sq(2)=0.00     P-val=1.0000
Kleibergen-Paap rk Wald statistic           Chi-sq(2)=0.00     P-val=1.0000

Weak identification test
Ho: equation is weakly identified
Kleibergen-Paap Wald rk F statistic                 0.00
See main output for Cragg-Donald weak id test critical values

Weak-instrument-robust inference
Tests of joint significance of endogenous regressors B1 in main equation
Ho: B1=0 and overidentifying restrictions are valid
Anderson-Rubin Wald test     F(3,4274)=59.02     P-val=0.0000
Anderson-Rubin Wald test     Chi-sq(3)=180.88    P-val=0.0000
Stock-Wright LM S statistic  Chi-sq(3)=138.06    P-val=0.0000

NB: Underidentification, weak identification and weak-identification-robust
test statistics heteroskedasticity-robust

Number of observations               N  =       4366
Number of regressors                 K  =         91
Number of instruments                L  =         92
Number of excluded instruments       L1 =          3

IV (2SLS) estimation
--------------------
Estimates efficient for homoskedasticity only
Statistics robust to heteroskedasticity
Number of obs =     4366
F( 90,  4275) =    14.89
Prob > F      =   0.0000
Total (centered) SS     =  2.40966e+12                Centered R2   =   0.5528
Total (uncentered) SS   =  2.64194e+12                Uncentered R2 =   0.5921
Residual SS             =  1.07757e+12                Root MSE      =    15710

------------------------------------------------------------------------------
|               Robust
fas3 |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
sumgdp |   .0036165   .0025838     1.40   0.162    -.0014477    .0086807
gdpdiffsq |  -1.73e-09   2.07e-10    -8.37   0.000    -2.14e-09   -1.33e-09
gdpdiffxsk~4 |  -.0112809    .000798   -14.14   0.000     -.012845   -.0097168
skdiff4 |   4674.125    3803.89     1.23   0.219    -2781.363    12129.61
invcj |  -2049.643   1190.903    -1.72   0.085    -4383.769    284.4835
tcj2 |  -1977.099   593.6901    -3.33   0.001     -3140.71    -813.488
tcj2xskdif~4 |  -8263.003   978.4968    -8.44   0.000    -10180.82   -6345.184
tci2 |  -3647.576   639.0047    -5.71   0.000    -4900.002    -2395.15
dist |  -1.728639   .2666684    -6.48   0.000      -2.2513   -1.205979
…..
_cons |   42783.49   9530.638     4.49   0.000     24103.78     61463.2
------------------------------------------------------------------------------
Underidentification test (Kleibergen-Paap rk LM statistic):              0.000
Chi-sq(2) P-val =    1.0000
------------------------------------------------------------------------------
Weak identification test (Kleibergen-Paap rk Wald F statistic):          0.000
Stock-Yogo weak ID test critical values:                       <not available>
------------------------------------------------------------------------------
Hansen J statistic (overidentification test of all instruments):         0.269
Chi-sq(1) P-val =    0.6039
-endog- option:
Endogeneity test of endogenous regressors:                             125.171
Chi-sq(3) P-val =    0.0000
Regressors tested:    sumgdp gdpdiffsq gdpdiffxskdiff4
------------------------------------------------------------------------------
------------------------------------------------------------------------------

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
```