Home  /  Stata News  /  Vol 41 No 1  /  In the spotlight: Metrobus shock drives two million more passengers to the subway
The Stata News

«Back to main page

In the spotlight: Metrobus shock drives two million more passengers to the subway

If an unusually high number of people commute by metrobus on a particular day, what is the effect on the number of people who commute by subway that day? How would this impact evolve over the following days? We can use impulse–response functions (IRFs) based on either vector autoregressions or local projections to characterize these patterns. But we couldn't make causal interpretations—how a change in one transportation method causes changes in another—because public transportation use is endogenous:

  1. Commuters not using the metrobus today might be using the subway instead (switching).
  2. Commuters may use the metrobus to get to a subway station (using multiple systems).

The ivlpirf command estimates IRFs through a local-projection method while allowing for inclusion of instrumental variables to account for the endogeneity of the impulse variable. This results in an IRF that can be interpreted as a structural causal effect. For more details, see Jordà and Taylor (2025).

Public transportation system setup

We have data on daily changes in passenger counts for four public transportation systems: bus, trolley, metrobus, and subway.

. use https://www.stata.com/exdata/ptransport.dta
(Public transport data)

. describe

Contains data from https://www.stata.com/exdata/ptransport.dta
 Observations:         1,302                  Public transport data 
    Variables:             8                  21 Jan 2026 12:44

Variable Storage Display Value
name type format label Variable label
date float %td d_bus float %9.0g Bus passengers (daily change, millions) d_trolley float %9.0g Trolley passengers (daily change, millions) d_metrobus float %9.0g Metrobus passengers (daily change, millions) d_subway float %9.0g Subway passengers (daily change, millions) protest float %9.0g protest Protest on main road closed_mbus float %9.0g # of unavailable metrobus stations today closed_sbwy float %9.0g # of unavailable subway stations today
Sorted by:

Before we can use time-series tools in Stata, we need to tsset the dataset. We will use a business calendar because we do not have data for weekends (we care only about days when people commute to work, school, etc.). See [D] datetime business calendars to learn more about business calendars.

. bcal create mycal, from(date) replace
(output omitted)

. bcal load mycal
(output omitted)

. generate bcaldate = bofd("mycal", date)

. format %tbmycal bcaldate

We can now declare our data as time series:

. tsset bcaldate

Time variable: bcaldate, 05jan2015 to 31dec2019
        Delta: 1 day

We begin by using the tsline command to look at the time-series line plots for metrobus and subway use:

. tsline d_metrobus, tlabel(90(260)1130)
irf1.svg
. tsline d_subway, tlabel(90(260)1130)
irf2.svg

Both the metrobus and the subway series fluctuate around 0 and appear stationary, as expected, because they reflect daily changes in passenger numbers.

Instrumental-variables local-projection IRFs

We are interested in how subway use responds during a number of steps (days) to a shock to metrobus use. To investigate this empirically, let’s use the ivlpirf command to estimate the structural IRF of d_subway responses to d_metrobus shocks. Simple correlations between the two would not address questions of causality, because subway use could affect metrobus use, metrobus use could affect subway use, or changes in both could be driven by a third factor. To properly account for the endogeneity of d_metrobus, we need an instrument that is correlated with d_metrobus but uncorrelated with the disturbance terms in the local-projection equations. We believe the closed_mbus variable is a suitable instrument because metrobus station closures for repairs should directly affect only the metrobus service and are not caused by other variables of the transportation network.

Accordingly, we specify d_subway as a dependent variable, d_metrobus as the endogenous impulse variable, and closed_mbus as the instrumental variable. The exog(protest) option includes the protest variable as an exogenous control in the model, meaning it is not influenced by subway or metrobus use or by any other features of the transportation network. The step(8) option sets the maximum step (horizon) to 8 days. The nonendogirf option suppresses the IRF for the response of the endogenous d_metrobus variable to its own shocks. The vce(hac nwest 20) option requests the heteroskedasticity- and-autocorrelation-consistent (HAC) variance–covariance matrix estimator with 20 lags (we believe there is no autocorrelation beyond the 20th lag). The nolog option suppresses the iteration log.

. ivlpirf d_subway, endogenous(d_metrobus = closed_mbus) exog(protest) 
     step(8) noendogirf vce(hac nwest 20) nolog


Final GMM criterion Q(b) = 7.43e-34

note: model is exactly identified.

Instrumental-variables local-projection impulse responses

Sample: 07jan2015 thru 19dec2019                         Number of obs = 1,292
HAC kernel: Newey–West with 20 lags 

IRF HAC
coefficient std. err. z P>|z| [95% conf. interval]
d_subway
--. .6810457 .04318 15.77 0.000 .5964145 .7656769
F1. .4510037 .0881676 5.12 0.000 .2781985 .6238089
F2. .3108047 .0738554 4.21 0.000 .1660508 .4555587
F3. .2026476 .0848311 2.39 0.017 .0363816 .3689135
F4. .0865929 .1036552 0.84 0.403 -.1165675 .2897532
F5. .110765 .0778456 1.42 0.155 -.0418096 .2633397
F6. .0709419 .0900847 0.79 0.431 -.1056209 .2475046
F7. .027358 .0824351 0.33 0.740 -.1342118 .1889279
F8. .0565121 .1190456 0.47 0.635 -.176813 .2898372
Note: Structural impulse–response functions are reported. Impulse: d_metrobus Response: d_subway Instrument: closed_mbus Controls: L.d_subway L2.d_subway L.d_metrobus L2.d_metrobus protest

Based on the p-values, we have evidence that the first four IRF coefficients are nonzero. The best way to understand this result is to look at the graph by using the irf suite of commands. We specify sirf (structural IRF) in the irf graph command because the IRF we estimated is structural; ivlpirf uses the exogenous variation of the instrumental variable to recover the responses to a structural shock.

. irf set myirf, replace
(file myirf.irf created)
(file myirf.irf now active)

. irf create ivlp, replace
irfname ivlp not found in myirf.irf
(file myirf.irf updated)

. irf graph sirf, irf(ivlp) impulse(d_metrobus) response(d_subway) xlabel(0(1)8) 
     ylabel(-.25(.25)1) legend(size(small)) xtitle(Days) ytitle(Daily change in subway passengers) 
     ylabel(0 "0" .68 "680,000")
irf3.svg

We can see that the structural response of subway use to a shock to metrobus use starts on the same day, with an increase of around 680,000 commuters. Then more commuters will join the subway over the next few days, and the shock will dissipate around day 4.

We can consider two interpretations of this positive response.

  1. For commuters who ride the metrobus to get to subway stations: if the positive shock is additional metrobuses being incorporated into the service, then even more commuters may adopt this strategy of riding the metrobus to the subway.
  2. For commuters who typically ride only the metrobus: if the positive shock is the metrobus being extra crowded, commuters may add some stretches of subway to their commute to avoid the crowds.

We can also estimate cumulative responses by specifying the cumulative option in the ivlpirf command. Cumulative responses are calculated by summing simple responses. Because d_subway is the change in subway passengers, the cumulative structural IRF coefficients represent how subway passenger levels would respond over time to a shock in d_metrobus.

. ivlpirf d_subway, endogenous(d_metrobus = closed_mbus) exog(protest)
     step(8) noendogirf vce(hac nwest 20) nolog cumulative


Final GMM criterion Q(b) = 1.99e-32

note: model is exactly identified.

Instrumental-variables local-projection impulse responses

Sample: 07jan2015 thru 19dec2019                         Number of obs = 1,292
HAC kernel: Newey–West with 20 lags

CIRF HAC
coefficient std. err. z P>|z| [95% conf. interval]
d_subway
--. .6810457 .04318 15.77 0.000 .5964145 .7656769
F1. 1.132049 .1029469 11.00 0.000 .9302771 1.333822
F2. 1.442854 .1452044 9.94 0.000 1.158259 1.72745
F3. 1.645502 .1753518 9.38 0.000 1.301819 1.989185
44. 1.732095 .2471499 7.01 0.000 1.24769 2.2165
F5. 1.84286 .2808441 6.56 0.000 1.292415 2.393304
F6. 1.913802 .3367932 5.68 0.000 1.253699 2.573904
F7. 1.94116 .3799721 5.11 0.000 1.196428 2.685891
F8. 1.997672 .4333879 4.61 0.000 1.148247 2.847096
Note: Cumulative structural impulse–response functions are reported. Impulse: d_metrobus Response: d_subway Instrument: closed_mbus Controls: L.d_subway L2.d_subway L.d_metrobus L2.d_metrobus protest

Up to horizon 8, all the cumulative structural IRF coefficients appear to be nonzero. Let's look at the results using the corresponding graph. In this case, we need to specify csirf (cumulative structural IRF) in the irf graph command:

. irf create civlp, replace
irfname civlp not found in myirf.irf
(file myirf.irf updated)

. irf graph csirf, irf(civlp) impulse(d_metrobus) response(d_subway) xlabel(0(1)8) ylabel(0(.5)3) 
     legend(size(small)) xtitle(Days) ytitle(Daily subway passengers (millions)) 
irf4.svg

We observe that the cumulative response of the number of subway passengers when there is a shock to metrobus use is always positive and increasing. The upward movement gradually decelerates with each step until it stabilizes at a level of around two million commuters.

Conclusion

The ivlpirf command allows us to estimate IRFs by local projection while accounting for endogeneity of the impulse variable with instrumental variables. This allows causal interpretation of the IRF. In the example above, we found that positive shocks to metrobus use result in a same-day positive response in subway use, as well as up to a four-day increase in subway use.

Reference

Jordà, Ò., and A. M. Taylor. 2025. Local projections. Journal of Economic Literature 63: 59–110. https://doi.org/10.1257/jel.20241521.

— Miguel Dorta
Senior Statistician

— Alvaro Fuentes Higuera
Senior Econometrician

«Back to main page