In the spotlight: Metrobus shock drives two million more passengers to the subway
If an unusually high number of people commute by metrobus on a particular day, what is the effect on the number of people who commute by subway that day? How would this impact evolve over the following days? We can use impulse–response functions (IRFs) based on either vector autoregressions or local projections to characterize these patterns. But we couldn't make causal interpretations—how a change in one transportation method causes changes in another—because public transportation use is endogenous:
- Commuters not using the metrobus today might be using the subway instead (switching).
- Commuters may use the metrobus to get to a subway station (using multiple systems).
The ivlpirf command estimates IRFs through a local-projection method while allowing for inclusion of instrumental variables to account for the endogeneity of the impulse variable. This results in an IRF that can be interpreted as a structural causal effect. For more details, see Jordà and Taylor (2025).
Public transportation system setup
We have data on daily changes in passenger counts for four public transportation systems: bus, trolley, metrobus, and subway.
. use https://www.stata.com/exdata/ptransport.dta
(Public transport data)
. describe
Contains data from https://www.stata.com/exdata/ptransport.dta
Observations: 1,302 Public transport data
Variables: 8 21 Jan 2026 12:44
| Variable Storage Display Value | ||
| name type format label Variable label | ||
| date float %td d_bus float %9.0g Bus passengers (daily change, millions) d_trolley float %9.0g Trolley passengers (daily change, millions) d_metrobus float %9.0g Metrobus passengers (daily change, millions) d_subway float %9.0g Subway passengers (daily change, millions) protest float %9.0g protest Protest on main road closed_mbus float %9.0g # of unavailable metrobus stations today closed_sbwy float %9.0g # of unavailable subway stations today | ||
Before we can use time-series tools in Stata, we need to tsset the dataset. We will use a business calendar because we do not have data for weekends (we care only about days when people commute to work, school, etc.). See [D] datetime business calendars to learn more about business calendars.
. bcal create mycal, from(date) replace
(output omitted)
. bcal load mycal
(output omitted)
. generate bcaldate = bofd("mycal", date)
. format %tbmycal bcaldate
We can now declare our data as time series:
. tsset bcaldate
Time variable: bcaldate, 05jan2015 to 31dec2019
Delta: 1 day
We begin by using the tsline command to look at the time-series line plots for metrobus and subway use:
. tsline d_metrobus, tlabel(90(260)1130)
. tsline d_subway, tlabel(90(260)1130)
Both the metrobus and the subway series fluctuate around 0 and appear stationary, as expected, because they reflect daily changes in passenger numbers.
Instrumental-variables local-projection IRFs
We are interested in how subway use responds during a number of steps (days) to a shock to metrobus use. To investigate this empirically, let’s use the ivlpirf command to estimate the structural IRF of d_subway responses to d_metrobus shocks. Simple correlations between the two would not address questions of causality, because subway use could affect metrobus use, metrobus use could affect subway use, or changes in both could be driven by a third factor. To properly account for the endogeneity of d_metrobus, we need an instrument that is correlated with d_metrobus but uncorrelated with the disturbance terms in the local-projection equations. We believe the closed_mbus variable is a suitable instrument because metrobus station closures for repairs should directly affect only the metrobus service and are not caused by other variables of the transportation network.
Accordingly, we specify d_subway as a dependent variable, d_metrobus as the endogenous impulse variable, and closed_mbus as the instrumental variable. The exog(protest) option includes the protest variable as an exogenous control in the model, meaning it is not influenced by subway or metrobus use or by any other features of the transportation network. The step(8) option sets the maximum step (horizon) to 8 days. The nonendogirf option suppresses the IRF for the response of the endogenous d_metrobus variable to its own shocks. The vce(hac nwest 20) option requests the heteroskedasticity- and-autocorrelation-consistent (HAC) variance–covariance matrix estimator with 20 lags (we believe there is no autocorrelation beyond the 20th lag). The nolog option suppresses the iteration log.
. ivlpirf d_subway, endogenous(d_metrobus = closed_mbus) exog(protest)
step(8) noendogirf vce(hac nwest 20) nolog
Final GMM criterion Q(b) = 7.43e-34
note: model is exactly identified.
Instrumental-variables local-projection impulse responses
Sample: 07jan2015 thru 19dec2019 Number of obs = 1,292
HAC kernel: Newey–West with 20 lags
| IRF HAC | ||
| coefficient std. err. z P>|z| [95% conf. interval] | ||
| d_subway | ||
| --. | .6810457 .04318 15.77 0.000 .5964145 .7656769 | |
| F1. | .4510037 .0881676 5.12 0.000 .2781985 .6238089 | |
| F2. | .3108047 .0738554 4.21 0.000 .1660508 .4555587 | |
| F3. | .2026476 .0848311 2.39 0.017 .0363816 .3689135 | |
| F4. | .0865929 .1036552 0.84 0.403 -.1165675 .2897532 | |
| F5. | .110765 .0778456 1.42 0.155 -.0418096 .2633397 | |
| F6. | .0709419 .0900847 0.79 0.431 -.1056209 .2475046 | |
| F7. | .027358 .0824351 0.33 0.740 -.1342118 .1889279 | |
| F8. | .0565121 .1190456 0.47 0.635 -.176813 .2898372 | |
Based on the p-values, we have evidence that the first four IRF coefficients are nonzero. The best way to understand this result is to look at the graph by using the irf suite of commands. We specify sirf (structural IRF) in the irf graph command because the IRF we estimated is structural; ivlpirf uses the exogenous variation of the instrumental variable to recover the responses to a structural shock.
. irf set myirf, replace
(file myirf.irf created)
(file myirf.irf now active)
. irf create ivlp, replace
irfname ivlp not found in myirf.irf
(file myirf.irf updated)
. irf graph sirf, irf(ivlp) impulse(d_metrobus) response(d_subway) xlabel(0(1)8)
ylabel(-.25(.25)1) legend(size(small)) xtitle(Days) ytitle(Daily change in subway passengers)
ylabel(0 "0" .68 "680,000")
We can see that the structural response of subway use to a shock to metrobus use starts on the same day, with an increase of around 680,000 commuters. Then more commuters will join the subway over the next few days, and the shock will dissipate around day 4.
We can consider two interpretations of this positive response.
- For commuters who ride the metrobus to get to subway stations: if the positive shock is additional metrobuses being incorporated into the service, then even more commuters may adopt this strategy of riding the metrobus to the subway.
- For commuters who typically ride only the metrobus: if the positive shock is the metrobus being extra crowded, commuters may add some stretches of subway to their commute to avoid the crowds.
We can also estimate cumulative responses by specifying the cumulative option in the ivlpirf command. Cumulative responses are calculated by summing simple responses. Because d_subway is the change in subway passengers, the cumulative structural IRF coefficients represent how subway passenger levels would respond over time to a shock in d_metrobus.
. ivlpirf d_subway, endogenous(d_metrobus = closed_mbus) exog(protest)
step(8) noendogirf vce(hac nwest 20) nolog cumulative
Final GMM criterion Q(b) = 1.99e-32
note: model is exactly identified.
Instrumental-variables local-projection impulse responses
Sample: 07jan2015 thru 19dec2019 Number of obs = 1,292
HAC kernel: Newey–West with 20 lags
| CIRF HAC | ||
| coefficient std. err. z P>|z| [95% conf. interval] | ||
| d_subway | ||
| --. | .6810457 .04318 15.77 0.000 .5964145 .7656769 | |
| F1. | 1.132049 .1029469 11.00 0.000 .9302771 1.333822 | |
| F2. | 1.442854 .1452044 9.94 0.000 1.158259 1.72745 | |
| F3. | 1.645502 .1753518 9.38 0.000 1.301819 1.989185 | |
| 44. | 1.732095 .2471499 7.01 0.000 1.24769 2.2165 | |
| F5. | 1.84286 .2808441 6.56 0.000 1.292415 2.393304 | |
| F6. | 1.913802 .3367932 5.68 0.000 1.253699 2.573904 | |
| F7. | 1.94116 .3799721 5.11 0.000 1.196428 2.685891 | |
| F8. | 1.997672 .4333879 4.61 0.000 1.148247 2.847096 | |
Up to horizon 8, all the cumulative structural IRF coefficients appear to be nonzero. Let's look at the results using the corresponding graph. In this case, we need to specify csirf (cumulative structural IRF) in the irf graph command:
. irf create civlp, replace
irfname civlp not found in myirf.irf
(file myirf.irf updated)
. irf graph csirf, irf(civlp) impulse(d_metrobus) response(d_subway) xlabel(0(1)8) ylabel(0(.5)3)
legend(size(small)) xtitle(Days) ytitle(Daily subway passengers (millions))
We observe that the cumulative response of the number of subway passengers when there is a shock to metrobus use is always positive and increasing. The upward movement gradually decelerates with each step until it stabilizes at a level of around two million commuters.
Conclusion
The ivlpirf command allows us to estimate IRFs by local projection while accounting for endogeneity of the impulse variable with instrumental variables. This allows causal interpretation of the IRF. In the example above, we found that positive shocks to metrobus use result in a same-day positive response in subway use, as well as up to a four-day increase in subway use.
Reference
Jordà, Ò., and A. M. Taylor. 2025. Local projections. Journal of Economic Literature 63: 59–110. https://doi.org/10.1257/jel.20241521.
— Miguel Dorta
Senior Statistician
— Alvaro Fuentes Higuera
Senior Econometrician