Home  /  Products  /  Stata 15  /  Threshold regression

# Threshold regression

## Highlights

• Coefficients differ above and below thresholds
• Estimate one or more thresholds
• Select the number of thresholds or let threshold choose an optimal number
• Determine optimal number of thresholds based on
• BIC
• AIC
• HQIC (Hannan–Quinn information criterion)
• Dynamic and one-step-ahead predictions for time series
• Forecasts

Thresholds delineate one state from another. There is one effect (one set of coefficients) up to the threshold and another effect (another set of coefficients) beyond it.

Stata's new threshold command fits threshold models.

Threshold models are often applied to time-series data. The threshold can be a time. For example, if you think investment strategies changed as of some unknown date, you can fit a model to obtain an estimate of the date and obtain estimates of the different coefficients before and after it.

Or the threshold can be in terms of another variable. For example, beyond a certain level of inflation, central banks increase interest rates. You can fit a model to obtain an estimate of the threshold and the coefficients on either side of it.

## Let's see it work

The mayor of a fictional city wants to reduce air pollution caused by the buses the city runs. They have old buses and new buses. The old ones pollute more. They are replacing the old ones with new ones, but it will take awhile. In the meantime, the mayor wonders if pollution could be reduced by running old buses at times of the day when they produce the least amount of pollution.

She has tasked her advisors with finding out. Her advisors model pollutant concentration as a function of the number of old buses, new buses, and cars on the road. They allow the effect of these numbers to vary over time of day. They fit a threshold model. They type

. threshold pollution, threshvar(hour) regionvars(oldbus newbus car)


This command fits a model of pollution on regionvars(), which are oldbus, newbus, and car.
Variables oldbus, newbus, and car contain the counts of the vehicles on the road, and variable
pollution contains the measured pollution.

threshvar(hour) is the important part of what they typed. It instructs threshold to find the hour of
the day when the coefficients on the regionvars() change.

The data, by the way, are hourly and were collected over the month of January.

The result of fitting the model is

. threshold pollution, threshvar(hour) regionvars(oldbus newbus car)

Threshold regression

Full sample:    01jan2017 00:00:00 - 31jan2017 23:00:00
Number of obs    =        744
AIC              = -1169.1616
Number of thresholds =  1                        BIC              = -1132.2652
Threshold variable: hour                         HQIC             = -1154.9393

Order     Threshold        SSR

1         12.0000        151.2724

pollution        Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]

Region1
oldbus     .0704029   .0093162     7.56   0.000     .0521434    .0886624
newbus     .0601371   .0086037     6.99   0.000     .0432741    .0770001
car     .1000345   .0093666    10.68   0.000     .0816763    .1183927
_cons     6.995896   .1024878    68.26   0.000     6.795023    7.196768

Region2
oldbus     .2399615    .010146    23.65   0.000     .2200758    .2598473
newbus     .1446087   .0098378    14.70   0.000     .1253269    .1638904
car     .1187482   .0095611    12.42   0.000     .1000088    .1374877
_cons     9.392377   .1000035    93.92   0.000     9.196374     9.58838



The output appears in three parts: a header, a report on the threshold, and a table of coefficients for each region defined by the threshold.

The threshold is hour = 12.0000, meaning 12 o'clock.

After 12 o'clock, the amount that buses—old and new—pollute increases. Presumably, this is because more of the driving is stop and go. New buses switch their engines off when stopped. Rather interestingly, in region 1 old buses pollute 0.07−0.06 = 0.01 more than new buses. In region 2, they pollute 0.24−0.14 = 0.10 more. This means that swapping an old bus in the morning and a new bus in the afternoon would reduce pollution by 0.10−0.01 = 0.09 while keeping the same number of buses on the street.

The advisors also checked whether there was more than one threshold. They refit the model and told threshold to allow up to four thresholds. They typed

. threshold pollution, regionvars(oldbus newbus car) threshvar(hour)
optthresh(4)

pollution       Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]

Region1
oldbus     .0704029   .0002017   349.06   0.000     .0700076    .0707982
newbus     .0601371   .0001863   322.85   0.000      .059772    .0605022
car     .1000345   .0002028   493.31   0.000      .099637    .1004319
_cons     6.995896   .0022188  3152.99   0.000     6.991547    7.000245

Region2
oldbus     .2501281   .0004329   577.79   0.000     .2492796    .2509765
newbus     .1500926   .0004001   375.14   0.000     .1493084    .1508768
car     .1003077   .0004013   249.96   0.000     .0995212    .1010942
_cons     10.49741   .0037666  2787.00   0.000     10.49003     10.5048

Region3
oldbus     .2498727   .0002574   970.78   0.000     .2493683    .2503772
newbus     .1495873   .0002554   585.65   0.000     .1490867    .1500879
car     .1002132   .0002433   411.95   0.000     .0997365      .10069
_cons     9.002289   .0026688  3373.13   0.000     8.997058     9.00752


threshold reported two thresholds, one at 12:00 p.m. and the other at 3:00 p.m. (15:00). In the scatterplot, we see that the two estimated thresholds correspond with increases in the pollution levels.

Coefficients changed but the difference in pollution levels between old and new buses is right around 0.10 in both region 2 and region 3. Based on the previous model's results, advisors would have recommended moving old buses from the afternoon to the morning and new buses from the morning to the afternoon. These new results provide no reason for them to change that recommendation.