Home  /  Products  /  Features  /  Threshold regression

<-  See Stata's other features

Highlights

  • Coefficients differ above and below thresholds

    • Estimate one or more thresholds

    • Select the number of thresholds or let threshold choose an optimal number

  • Determine optimal number of thresholds based on

    • BIC

    • AIC

    • HQIC (Hannan–Quinn information criterion)

  • Dynamic and one-step-ahead predictions for time series

  • Forecasts

Thresholds delineate one state from another. There is one effect (one set of coefficients) up to the threshold and another effect (another set of coefficients) beyond it.

Stata's threshold command fits threshold models.

Threshold models are often applied to time-series data. The threshold can be a time. For example, if you think investment strategies changed as of some unknown date, you can fit a model to obtain an estimate of the date and obtain estimates of the different coefficients before and after it.

Or the threshold can be in terms of another variable. For example, beyond a certain level of inflation, central banks increase interest rates. You can fit a model to obtain an estimate of the threshold and the coefficients on either side of it.

Let's see it work

The mayor of a fictional city wants to reduce air pollution caused by the buses the city runs. They have old buses and new buses. The old ones pollute more. They are replacing the old ones with new ones, but it will take awhile. In the meantime, the mayor wonders if pollution could be reduced by running old buses at times of the day when they produce the least amount of pollution.

She has tasked her advisors with finding out. Her advisors model pollutant concentration as a function of the number of old buses, new buses, and cars on the road. They allow the effect of these numbers to vary over time of day. They fit a threshold model. They type

. threshold pollution, threshvar(hour) regionvars(oldbus newbus car)

This command fits a model of pollution on regionvars(), which are oldbus, newbus, and car.
Variables oldbus, newbus, and car contain the counts of the vehicles on the road, and variable
pollution contains the measured pollution.

threshvar(hour) is the important part of what they typed. It instructs threshold to find the hour of
the day when the coefficients on the regionvars() change.

The data, by the way, are hourly and were collected over the month of January.

The result of fitting the model is

. threshold pollution, threshvar(hour) regionvars(oldbus newbus car)

Threshold regression

Full sample: 01jan2017 00:00:00 - 31jan2017 23:00:00
                                                 Number of obs    =        744
                                                 AIC              = -1169.1616
Number of thresholds = 1                         BIC              = -1132.2652
Threshold variable: hour                         HQIC             = -1154.9393

Order Threshold SSR
1 12.0000 151.2724
pollution Coefficient Std. err. z P>|z| [95% conf. interval]
Region1
oldbus .0704029 .0093162 7.56 0.000 .0521434 .0886624
newbus .0601371 .0086037 6.99 0.000 .0432741 .0770001
car .1000345 .0093666 10.68 0.000 .0816763 .1183927
_cons 6.995896 .1024878 68.26 0.000 6.795023 7.196768
Region2
oldbus .2399615 .010146 23.65 0.000 .2200758 .2598473
newbus .1446087 .0098378 14.70 0.000 .1253269 .1638904
car .1187482 .0095611 12.42 0.000 .1000088 .1374877
_cons 9.392377 .1000035 93.92 0.000 9.196374 9.58838

The output appears in three parts: a header, a report on the threshold, and a table of coefficients for each region defined by the threshold.

The threshold is hour = 12.0000, meaning 12 o'clock.

After 12 o'clock, the amount that buses—old and new—pollute increases. Presumably, this is because more of the driving is stop and go. New buses switch their engines off when stopped. Rather interestingly, in region 1 old buses pollute 0.07−0.06 = 0.01 more than new buses. In region 2, they pollute 0.24−0.14 = 0.10 more. This means that swapping an old bus in the morning and a new bus in the afternoon would reduce pollution by 0.10−0.01 = 0.09 while keeping the same number of buses on the street.

The advisors also checked whether there was more than one threshold. They refit the model and told threshold to allow up to four thresholds. They typed

. threshold pollution, regionvars(oldbus newbus car) threshvar(hour)
     optthresh(4)

pollution Coefficient Std. err. z P>|z| [95% conf. interval]
Region1
oldbus .0704029 .0002017 349.06 0.000 .0700076 .0707982
newbus .0601371 .0001863 322.85 0.000 .059772 .0605022
car .1000345 .0002028 493.31 0.000 .099637 .1004319
_cons 6.995896 .0022188 3152.99 0.000 6.991547 7.000245
Region2
oldbus .2501281 .0004329 577.79 0.000 .2492796 .2509765
newbus .1500926 .0004001 375.14 0.000 .1493084 .1508768
car .1003077 .0004013 249.96 0.000 .0995212 .1010942
_cons 10.49741 .0037666 2787.00 0.000 10.49003 10.5048
Region3
oldbus .2498727 .0002574 970.78 0.000 .2493683 .2503772
newbus .1495873 .0002554 585.65 0.000 .1490867 .1500879
car .1002132 .0002433 411.95 0.000 .0997365 .10069
_cons 9.002289 .0026688 3373.13 0.000 8.997058 9.00752

threshold reported two thresholds, one at 12:00 p.m. and the other at 3:00 p.m. (15:00). In the scatterplot, we see that the two estimated thresholds correspond with increases in the pollution levels.


Coefficients changed but the difference in pollution levels between old and new buses is right around 0.10 in both region 2 and region 3. Based on the previous model's results, advisors would have recommended moving old buses from the afternoon to the morning and new buses from the morning to the afternoon. These new results provide no reason for them to change that recommendation.

Tell me more

Learn more about Stata's time-series features.

Read more about threshold and all of Stata's time-series commands in the Stata Time-Series Reference Manual.