Home  /  Stata News  /  Vol 39 No 3  /  In the spotlight: Weak instruments and wacky confidence intervals

In the spotlight: Weak instruments and wacky confidence intervals

Stata's instrumental-variables regression command, ivregress, is widely used for fitting linear models with endogeneity. estat weakrobust, a new postestimation command in StataNow™ for ivregress, lets users perform tests and construct confidence intervals that are robust to weak instruments.

Weak instruments present a challenge for inference. The approximate normality of the instrumental-variables estimates exploited in conventional inference is inherited in part from the first-stage estimates—that is, from the relationship between the endogenous variables and the instruments.

When this relationship is weak, however, the instrumental-variables estimates depend nonlinearly on the first-stage estimates. Thus, the normality of the latter does not translate to the normality of the former. Standard t tests and associated confidence intervals become misleading (see Andrews, Stock, and Sun [2019]).

To get valid inference, you need to do something different. In this spotlight, I'll show you how to get robust tests and confidence intervals when you have weak instruments. I'll also show you an example of a case where these confidence intervals get weird.

Robust tests

One well-established way to get around the weak-instruments inference problem is the test of Anderson and Rubin (1949). The Anderson–Rubin test statistic has a distribution that does not depend on the first-stage estimates, making it robust to arbitrarily weak instruments.

The related conditional likelihood-ratio (CLR) test of Moreira (2003), which is appropriate when the model is overidentified, has similar properties: conditional on a known statistic, the distribution of the test statistic does not depend on the first-stage estimates.

Both tests can be inverted to produce confidence intervals. We can request Anderson–Rubin and CLR confidence intervals using estat weakrobust, ci after ivregress. For example, we can type

. webuse laborsup

. ivregress 2sls fem_inc fem_educ kids (other_inc = male_educ)

Instrumental-variables 2SLS regression            Number of obs   =        500
Wald chi2(3)    =     105.23
Prob > chi2     =     0.0000
R-squared       =     0.2814
Root MSE        =     10.759

fem_inc   Coefficient  Std. err.      z    P>|z|     [95% conf. interval]

other_inc     -.374891    .064153    -5.84   0.000    -.5006286   -.2491535
fem_educ     1.274646   .1831334     6.96   0.000     .9157108    1.633581
kids    -1.717837   .3564194    -4.82   0.000    -2.416406   -1.019268
_cons     24.76533   3.714901     6.67   0.000     17.48425     32.0464

Endogenous: other_inc
Exogenous:  fem_educ kids male_educ

. estat weakrobust, ci

Confidence interval robust to weak instruments

Anderson–Rubin
Coefficient    [95% conf. interval]

other_inc     -.374891    -.5065501   -.2500736

Note: Anderson–Rubin CI reported by default
because model is just identified.

When instruments are strong, as in this case, robust tests are valid but conservative, typically returning confidence intervals that are slightly wider than conventional intervals.

When instruments are weak, however, robust tests can deliver results that are noticeably different from conventional tests—even a little wacky. (This is not always the case, however. Example 6 in [R] ivregress postestimation shows a case where weak instruments lead to an ordinary but wider confidence interval.)

Nonstandard intervals

Unlike conventional confidence intervals, confidence intervals produced by inverting Anderson–Rubin and CLR tests are not guaranteed to be finite intervals. Confidence intervals may cover the whole real line, or they may take the form of a union of several intervals. A confidence interval can even be empty if your model is overidentified and you use an Anderson–Rubin confidence interval (but if your model is overidentified, we recommend using a CLR confidence interval).

We can see an example of a CLR confidence interval that takes an irregular form by modeling gas mileage in the trusty Stata 1978 automobile dataset. We want to include the price of a car in the regression as an explanatory variable, but we believe it is endogenous, so we include indicators for the repair record of the car as instruments.

. sysuse auto
(1978 automobile data)

. ivregress 2sls mpg weight length foreign displacement gear_ratio
(price = i.rep78), vce(robust)

Instrumental-variables 2SLS regression            Number of obs   =         69
Wald chi2(6)    =      14.97
Prob > chi2     =     0.0205
R-squared       =          .
Root MSE        =     11.028

mpg   Coefficient  Std. err.      z    P>|z|     [95% conf. interval]

price     .0055594   .0055978     0.99   0.321    -.0054122    .0165309
weight    -.0291645    .026713    -1.09   0.275    -.0815211     .023192
length     .3741789   .5311569     0.70   0.481    -.6668696    1.415227
foreign    -24.52327   22.40424    -1.09   0.274    -68.43477    19.38823
displacement    -.0410225   .0723021    -0.57   0.570    -.1827321    .1006871
gear_ratio     6.350259   6.084446     1.04   0.297    -5.575036    18.27555
_cons     1.635244   49.37558     0.03   0.974    -95.13912    98.40961

Endogenous: price
Exogenous:  weight length foreign displacement gear_ratio 2.rep78 3.rep78
4.rep78 5.rep78

We see that conventional inference does not provide evidence that the coefficient on price is different from 0. We suspect, however, that our instruments are weak and perform a test that is robust to weak instruments:

. estat weakrobust

Test robust to weak instruments
Model VCE: Robust

( 1)  price = 0

Cond. likelihood-ratio (CLR) test =   8.29
Prob > CLR = 0.0232

Notes: CLR test reported by default because
model is overidentified.
p-value computed by simulation
(25,000 replications).

Surprisingly, once weak instruments have been accounted for, we find statistical evidence that price is relevant in this regression: a p-value of 0.0232. Why is this the case? Is the CLR confidence interval narrower than the standard confidence interval? As it turns out, no:

. estat weakrobust, ci rseed(2024)

Searching for CI bounds:
Iteration 0:  Grid points = 500
Iteration 1:  Grid points = 1,000

(CI computed using 1,000 grid points on [-.050419, .061538])

Confidence interval robust to weak instruments
Model VCE: Robust

CLR
Interval   Coefficient    [95% conf. interval]

price
1     .0055594    -inf        -.0054904
2                  .0007917        +inf

Notes: CLR CI reported by default because model is
overidentified.
Computed using simulation (25,000
replications).
CI is a union of disjoint intervals.
CI contains infinite values.

(Here I have set a random seed for reproducibility. Simulation is used to compute critical values for the CLR test when the model VCE is robust.)

In fact, the confidence interval for the coefficient on price is very wide and actually unbounded: (-∞, -0.00549) U (0.000792, ∞). The CLR test cannot statistically rule out either a positive or a negative effect. It does, however, provide evidence against price having zero effect.

The fact that the CLR test does not impose a normality approximation on the coefficient on price itself gives it its robustness property but also allows for this unusual result.

Notice that estat weakrobust performed inference using CLR by default, as opposed to Anderson–Rubin. This is because the model fit by ivregress is overidentified (it has more instruments than endogenous variables). We can request the Anderson–Rubin confidence interval for this model if we are interested in it. It is similar but wider:

. estat weakrobust, ci ar

Searching for CI bounds:
Iteration 0:  Grid points = 500
Iteration 1:  Grid points = 1,000

(CI computed using 1,000 grid points on [-.050419, .061538])

Confidence interval robust to weak instruments
Model VCE: Robust

Anderson–Rubin
Interval   Coefficient    [95% conf. interval]

price
1     .0055594    -inf        -.0027981
2                  .0002308        +inf

Notes: CI is a union of disjoint intervals.
CI contains infinite values.

We expect the Anderson–Rubin confidence interval to be wider than the CLR confidence interval because the Anderson–Rubin test is a less powerful test when the model is overidentified. When a model is just identified, the two tests are equivalent.

Bottom line

If you are using ivregress and suspect you have weak instruments, you can use estat weakrobust to get valid tests and confidence intervals for the endogenous variable. If instruments really are weak, you may run into unusual confidence intervals. Confidence intervals may contain infinite values or take the form of a union of intervals. In these cases, the Stata output of estat weakrobust will contain a hyperlink to a help page that explains how to interpret these intervals.