IV quantile regression

Instrumental-variables quantile regression

Order

Watch video demo

<- See Stata's other features

Highlights

Quantile instrumental‐variables (IV) estimators
- Inverse quantile regression (IQR)
- Smoothed estimating equations
IQR estimator
- Confidence intervals robust to weak instruments
- Graphical convergence diagnostics
Simultaneous estimation over quantiles
Visualization of effects over quantiles
Specialized tests

IV not affecting the outcome
Equality of endogenous effects across quantiles
Effects greater than zero across quantiles
IV being exogenous instead of endogenous
See more features for linear models

When we want to study the effects of covariates on different quantiles of the outcome, we use quantile regression. But what if we suspect that a covariate is endogenous? The ivqregress command models quantiles of the outcome and, at the same time, controls for problems that arise from endogeneity.

-> Overview

-> When quantile regression matters

-> Let's see it work

-> Reference

-> Tell me more

Overview

When we use linear regression, we model the mean of the outcome. Yet, sometimes, we would like to study features of the outcome distribution other than the mean. For example, a policymaker may want to learn how participation in a 401(k) retirement plan would affect the lower-level, median, and upper-level conditional quantiles of net wealth.

ivqregress estimates parameters at quantiles of the outcome distribution and accounts for endogeneity problems that arise for reasons such as self-selection, omission of a relevant variable, or measurement error. For example, participation in the 401(k) program may be endogenous because the people who do and do not participate may have different saving preferences, which will affect net wealth growth.

When quantile regression matters

Suppose we have a simple model $E(y|x) = \beta_0 + x\beta_1$, where $y$ is the outcome variable and $x$ is a covariate. $x$ takes values in $\{0, 1, 2, 3, 4, 5, 6\}$. By definition, $\beta_1$ fully characterizes the effects of increasing one unit of $x$ on the conditional mean of outcome $y$; that is, $\beta_1 = E(y|x=a + 1) - E(y|x=a)$. Below, we consider two scenarios of the data-generating process.

1. Location shifted only. The probability density function of the outcome conditional on $x=a+1$, $f(y|x=a+1)$, is only location shifted relative to $f(y|x=a)$. In this case, $\beta_1$ summarizes the effect of $x$ not only on the conditional mean but also on each conditional quantile of $y$. This case is illustrated in the left panel of Figure 1.

2. Location shifted and rescaled. The probability density function of the outcome conditional on $x=a+1$, $f(y|x=a+1)$, is both location shifted and rescaled relative to $f(y|x=a)$. In this case, $\beta_1$ summarizes the effect of $x$ only on the conditional mean but not on the conditional quantiles of $y$. This case is illustrated in the right panel of Figure 1.

Figure 1

In the left panel, we see that each conditional density is parallel relative to the others; only the location has been shifted. In this case, $\beta_1$ captures the shift in both conditional mean and any other conditional quantiles of the outcome. As a result, running a linear regression provides as much information about $\beta_1$ as a quantile regression.

In contrast, in the right panel, the conditional density for each level of $x$ has a different location and a different shape. Thus, $\beta_1$ can summarize the shifts in conditional mean, which generally differ from the shifts in conditional quantiles. Quantile regression becomes necessary to learn about the effects of $x$ on the conditional quantiles of the outcome.

Let's see it work

We want to estimate the effect of 401(k) participation (p401k) on different conditional quantiles of net financial assets (assets). We use data reported by Chernozhukov and Hansen (2004). These data are from a sample of households in the 1990 Survey of Income and Program Participation (SIPP). For the head of household, we have data on income (income), age (age), number of people in the family (familysize), marital status (married), participation in IRA (ira), participation in pension benefit (pension), home ownership (ownhome), and years of education (educ).

We suspect 401(k) participation is endogenous because it may depend on unobserved factors such as saving preference that also impact financial assets. We will use 401(k) eligibility (e401k) as an instrument for 401(k) participation.

We use the IQR estimator (ivqregress iqr) to estimate the effect of 401(k) participation on the conditional median (the default) of the net financial assets.

. webuse assets2
(Excerpt from Chernozhukov and Hansen (2004))

. ivqregress iqr assets (i.p401k = i.e401k) income age familysize
  i.married i.ira i.pension i.ownhome educ

Initial grid:
Quantile = 0.50: .........10.........20.........30

Adaptive grid:
Quantile = 0.50: .........10.........20.........30

IV median regression                                     Number of obs =   9,913
Estimator: Inverse quantile regression                   Wald chi2(9)  = 1289.75
                                                         Prob > chi2   =  0.0000



                               Robust                                           
      assets     Coefficient  std. err.      z    P>|z|     [95% conf. interval]

       p401k                                                                    
        Yes        5313.397   573.2818     9.27   0.000     4189.786    6437.009
      income       .1577512   .0124889    12.63   0.000     .1332735    .1822289
         age       99.96526   8.561923    11.68   0.000      83.1842    116.7463
  familysize      -197.8251   54.36773    -3.64   0.000    -304.3838   -91.26627
                                                                                
     married                                                                    
    Married       -1359.124   227.3366    -5.98   0.000    -1804.696   -913.5528
                                                                                
         ira                                                                    
        Yes        22629.61   1022.706    22.13   0.000     20625.15    24634.08
                                                                                
     pension                                                                    
Receives ..       -693.8347   210.6176    -3.29   0.001    -1106.638   -281.0317
                                                                                
     ownhome                                                                    
        Yes       -30.29657   154.7265    -0.20   0.845     -333.555    272.9618
        educ      -96.43983   32.09465    -3.00   0.003    -159.3442   -33.53547
       _cons      -4998.673   570.1315    -8.77   0.000     -6116.11   -3881.236
              
Endogenous: 1.p401k
Exogenous:  income age familysize 1.married 1.ira 1.pension 1.ownhome educ
            1.e401k

The coefficient for p401k is 5,313. This means participation in a 401(k) would increase the median net financial assets by $5,313, conditional on other covariates, relative to a scenario where no one participates.

After ivqregress iqr, we can use estat dualci to obtain the dual confidence interval (CI) that is robust to weak instruments for the coefficient on the endogenous variables.

. estat dualci

Dual confidence interval                                    Number of obs = 9,913

                               Robust                               Dual         
      assets     Coefficient  std. err.      z    P>|z|     [95% conf. interval]

       p401k                                                                    
        Yes        5313.397   573.2818     9.27   0.000     3683.916    7304.986

The dual CI is usually wider than the regular CI; it provides more robust inference if the instruments are weak. Here the dual 95% CI is [3684, 7305], which is wider than the regular 95% CI [4190, 6437].

We have estimated the 401(k) participation (p401k) treatment effect on the conditional median of net financial assets (assets). However, from the policy designer's point of view, we may be more interested in estimating the treatment effect of p401k on other conditional quantiles of assets.

This time, we specify ivqregress smooth to use the smoothed estimating equations estimator to fit the model at different quantiles. In particular, we specify the quantile(10(10)90) option to fit the IVQR model at the 10th, 20th, . . . , 90th quantiles.

. ivqregress smooth assets (i.p401k = i.e401k) income age familysize      
  i.married i.ira i.pension i.ownhome educ, quantile(10(10)90)

Fitting smoothed IV quantile regression:

Quantile = .1:
Step 1: Bandwidth = 1327.0069    GMM criterion Q(b) = 9.224e-11
Step 2: Bandwidth = 1311.3131    GMM criterion Q(b) = 1.995e-10

(output omitted)

IV quantile regression                                    Number of obs =   9,913
Estimator: Smoothed estimating equations                  Wald chi2(81) = 4932.84
                                                          Prob > chi2   =  0.0000



                               Robust                                           
      assets     Coefficient  std. err.      z    P>|z|     [95% conf. interval]

q10                                                                             
       p401k                                                                    
        Yes        3191.667   486.2193     6.56   0.000     2238.695    4144.639
      income       .0318585   .0123707     2.58   0.010     .0076124    .0561046
         age       128.9268   15.42632     8.36   0.000     98.69178    159.1618
  familysize      -329.8374   125.4774    -2.63   0.009    -575.7687   -83.90615
                                                                                
     married                                                                    
    Married       -1480.013   386.4611    -3.83   0.000    -2237.463   -722.5635
                                                                                
         ira                                                                    
        Yes        7914.049   342.9506    23.08   0.000     7241.878     8586.22
                                                                                
     pension                                                                    
Receives ..       -5.356704   334.9869    -0.02   0.987     -661.919    651.2056
                                                                                
     ownhome                                                                    
        Yes        1043.279    308.722     3.38   0.001     438.1945    1648.363
        educ      -289.8807   53.06713    -5.46   0.000    -393.8904   -185.8711
       _cons      -7631.313   1214.725    -6.28   0.000    -10012.13   -5250.496

(output omitted)



q90                                                                             
       p401k                                                                    
        Yes        15525.23   3035.965     5.11   0.000     9574.848    21475.61
      income       .8311508   .0574108    14.48   0.000     .7186277    .9436738
         age       486.9876   51.61654     9.43   0.000      385.821    588.1541
  familysize      -586.2617   193.5936    -3.03   0.002    -965.6983   -206.8252
                                                                                
     married                                                                    
    Married       -3877.165   781.2296    -4.96   0.000    -5408.347   -2345.983
                                                                                
         ira                                                                    
        Yes        67888.86   4902.106    13.85   0.000     58280.91    77496.81
                                                                                
     pension                                                                    
Receives ..       -4829.506   898.9147    -5.37   0.000    -6591.346   -3067.665
                                                                                
     ownhome                                                                    
        Yes        715.6272   722.8727     0.99   0.322    -701.1773    2132.432
        educ        14.5293   110.8781     0.13   0.896    -202.7878    231.8464
       _cons      -19953.21   2326.698    -8.58   0.000    -24513.45   -15392.96

Endogenous: 1.p401k
Exogenous:  income age familysize 1.married 1.ira 1.pension 1.ownhome educ
            1.e401k

The results show the estimates for the effect of 401(k) participation on each conditional quantile of the asset. The coefficient interpretation is similar to before, except we are looking at different conditional quantiles. For example, for quantile q90, the estimate for the coefficient on p401k is 15,525. Thus, 401(k) participation would increase net financial assets' 90% conditional quantile by $15,525.

In addition to looking at the exact numerical estimates from the coefficient table, we can use estat coefplot to visualize the p401k's treatment effect from the lower to the upper quantile.

. estat coefplot

Figure 2

The dots in the plot show the point estimates of p401k's treatment effect on different conditional quantiles of assets, and the gray bound shows the 95% pointwise CI. We see an upward trend of p401k's treatment effect. At lower-level quantiles such as the 10th, 20th, 30th, and 40th quantiles, the treatment effect is relatively flat. However, the treatment effect increases in the upper-level quantiles. The red line shows the two-stage least-squares estimates, which can be used as a benchmark.

We can use estat endogeffects to test the following hypotheses regarding the endogenous covariate:

No effect : The 401(k) participation does not affect net financial assets for all the estimated quantiles.
Constant effect : The 401(k) participation’s treatment effect is constant for all the estimated quantiles.
Dominance : The 401(k) participation is unambiguously positive for all the estimated quantiles; that is, the coefficient values are strictly positive.
Exogeneity : The 401(k) participation is exogenous.

We use estat endogeffects to show the Kolmogorov–Smirnov statistic and the 95% critical value for each hypothesis. We can reject the null hypothesis if the test statistic is greater than the critical value; otherwise, we cannot reject the null hypothesis. We specify the rseed() option to make the results reproducible because the critical values are generated from a bootstrap sample.

. estat endogeffects, rseed(12345671)

Tests for endogenous effects           Replications = 100


Null hypothesis        KS statistic    95% critical value

No effect                    11.507                 2.593
Constant effect               5.351                 2.391
Dominance                     0.000                 2.556
Exogeneity                    4.195                 2.526

Note: If the KS statistic < critical value, there is
      insufficient evidence to reject the null 
      hypothesis. (KS = Kolmogorov—Smirnov)

We find that 401(k) participation has some effect, treatment is not constant across different quantiles, and 401(k) participation is endogenous. The test for dominance indicates that 401(k) participation is unambiguously beneficial for all the estimated quantiles of assets.

The test results are consistent with the coefficient plot produced by estat coefplot, where we saw that the treatment effects are positive (dominance and no effect hypotheses) and upward trended (constant effect hypothesis).

Reference

Chernozhukov, V., and C. Hansen. 2004. The effects of 401(k) participation on the wealth distribution: An instrumental quantile regression analysis. Review of Economics and Statistics 86: 735–751.

Tell me more

Read more about instrumental-variables quantile regression in the Stata Base Reference Manual; see [R] ivqregress.

Products

New in Stata 19

Why Stata

All features

Disciplines

Stata/MP

StataNow

Order Stata

Purchase

Order Stata

Bookstore

Stata Press

Stata Journal

Gift Shop

Learn

Free webinars

NetCourses

Classroom and web training

Organizational training

Video tutorials

Third-party courses

Web resources

Teaching with Stata

Support

Training

Video tutorials

FAQs

Statalist: The Stata Forum

Resources

Technical support

Customer service

Alerts

Company

News and events

Customer service

Careers

We use cookies

We use cookies to ensure that we give you the best experience on our website—to enhance site navigation, to analyze usage, and to assist in our marketing efforts. By continuing to use our site, you consent to the storing of cookies on your device and agree to delivery of content, including web fonts and JavaScript, from third party web services.

Cookie Settings

Privacy policy

Last updated: 16 November 2022

StataCorp LLC (StataCorp) strives to provide our users with exceptional products and services. To do so, we must collect personal information from you. This information is necessary to conduct business with our existing and potential customers. We collect and use this information only where we may legally do so. This policy explains what personal information we collect, how we use it, and what rights you have to that information.

Required cookies

Advertising cookies

Required cookies

These cookies are essential for our website to function and do not store any personally identifiable information. These cookies cannot be disabled.
Advertising and performance cookies

This website uses cookies to provide you with a better user experience. A cookie is a small piece of data our website stores on a site visitor's hard drive and accesses each time you visit so we can improve your access to our site, better understand how you use our site, and serve you content that may be of interest to you. For instance, we store a cookie when you log in to our shopping cart so that we can maintain your shopping cart should you not complete checkout. These cookies do not directly store your personal information, but they do support the ability to uniquely identify your internet browser and device.

Please note: Clearing your browser cookies at any time will undo preferences saved here. The option selected here will apply only to the device you are currently using.

Accept Cookies


		Robust
assets		Coefficient std. err. z P>\|z\| [95% conf. interval]

p401k
Yes		5313.397 573.2818 9.27 0.000 4189.786 6437.009
income		.1577512 .0124889 12.63 0.000 .1332735 .1822289
age		99.96526 8.561923 11.68 0.000 83.1842 116.7463
familysize		-197.8251 54.36773 -3.64 0.000 -304.3838 -91.26627

married
Married		-1359.124 227.3366 -5.98 0.000 -1804.696 -913.5528

ira
Yes		22629.61 1022.706 22.13 0.000 20625.15 24634.08

pension
Receives ..		-693.8347 210.6176 -3.29 0.001 -1106.638 -281.0317

ownhome
Yes		-30.29657 154.7265 -0.20 0.845 -333.555 272.9618
educ		-96.43983 32.09465 -3.00 0.003 -159.3442 -33.53547
_cons		-4998.673 570.1315 -8.77 0.000 -6116.11 -3881.236


		Robust Dual
assets		Coefficient std. err. z P>\|z\| [95% conf. interval]

p401k
Yes		5313.397 573.2818 9.27 0.000 3683.916 7304.986


q90
p401k
Yes		15525.23 3035.965 5.11 0.000 9574.848 21475.61
income		.8311508 .0574108 14.48 0.000 .7186277 .9436738
age		486.9876 51.61654 9.43 0.000 385.821 588.1541
familysize		-586.2617 193.5936 -3.03 0.002 -965.6983 -206.8252

married
Married		-3877.165 781.2296 -4.96 0.000 -5408.347 -2345.983

ira
Yes		67888.86 4902.106 13.85 0.000 58280.91 77496.81

pension
Receives ..		-4829.506 898.9147 -5.37 0.000 -6591.346 -3067.665

ownhome
Yes		715.6272 722.8727 0.99 0.322 -701.1773 2132.432
educ		14.5293 110.8781 0.13 0.896 -202.7878 231.8464
_cons		-19953.21 2326.698 -8.58 0.000 -24513.45 -15392.96


Null hypothesis		KS statistic 95% critical value

No effect		11.507 2.593
Constant effect		5.351 2.391
Dominance		0.000 2.556
Exogeneity		4.195 2.526