»  Home »  Products »  Features »  Bayesian predictions

## Bayesian predictions

### Highlights

• Predict new values or check model fit
• Simulate outcome values for all or a subset of observations
• Predict functions of simulated outcomes—test statistics and test quantities
• Specify your own prediction functions using:
• Mata functions or
• Stata programs
• Produce graphical summaries and more for simulated outcomes and their functions
• Generate posterior summaries of simulated values
• Generate MCMC replicates
• Compute posterior predictive p-values
See all features

Bayesian inference focuses on estimation of model parameters. But what if we want to estimate a future outcome value? This is one of the goals of Bayesian predictions.

Bayesian predictions are outcome values simulated from the posterior predictive distribution, which is the distribution of the unobserved (future) data given the observed data. They can be used as optimal predictors in forecasting, optimal classifiers in classification problems, imputations for missing data, and more. They are also important for checking model goodness of fit.

Stata's bayespredict command computes Bayesian predictions. The bayesreps command computes Markov chain Monte Carlo (MCMC) replicates of outcome variables and the bayesstats ppvalues command computes posterior predictive p-values, all of which are based on Bayesian predictions and used for model diagnostic checks.

First, you fit a model. Here is a Bayesian linear regression fit using bayesmh:

. bayesmh y x1 x2, likelihood(normal({var}))
prior({y:}, normal(0,100)) prior({var}, igamma(1,2))


To compute Bayesian predictions for outcome y and save them in ysimdata.dta, type

. bayespredict {_ysim}, saving(ysimdata)


These simulated values can be used to perform model diagnostic checks or to make forecasts if you change the values recorded in x1 and x2 or add new values for them.

Bayesian prediction differs from frequentist prediction. Prediction, in a frequentist sense, is a deterministic function of estimated model parameters. For example, in a linear regression, the linear predictor, which is a linear combination of estimated regression coefficients and observed covariates, is used to predict values of continuous outcomes. Bayesian predictions, on the other hand, are simulated outcomes (or functions of them) and are thus stochastic quantities. (Remember to specify a random-number seed for reproducibility when computing Bayesian predictions!)

Unlike classical prediction, which produces a single value for each observation, Bayesian prediction produces an MCMC sample of values for each observation. If you have n observations and your MCMC sample size is T, the result of Bayesian prediction is a T × n dataset. Thus, bayespredict saves Bayesian predictions in a separate file.

Instead of an entire MCMC sample of outcome values, you can use bayesreps to obtain a random subset of MCMC replicates and save them as new variables in your current dataset. These are useful, for instance, for quick visual comparison of the simulated data with the observed data.

. bayesreps yrep*, nreps(10)


You may also be interested in various summaries of MCMC outcome values such as posterior means or medians. You may compute those (and more) and save them as new variables in your current dataset. For instance,

. bayespredict pmean, mean


computes the posterior mean of y across MCMC values for each observation and stores the results in the new variable pmean.

Finally, you can compute functions of Bayesian predictions such as test statistics and test quantities (test statistics that also depend on parameters). These are also useful for model checks, and we can use bayesstats ppvalues to compute posterior predictive p-values for them such as for the minimum and maximum statistics below:

. bayesstats ppvalues (ymin: @min({_ysim})) (ymax: @max({_ysim})) using ysimdata


The above commands are available after Bayesian models fit using bayesmh.

There are more features, and we demonstrate them in the examples below.

## Predict new outcome values—posterior means, credible intervals

Let's fit the CES production function from Example 1 of [R] nl using bayesmh. This function models the log of output from firms as a nonlinear function of their capital and labor usage.

. webuse production

. bayesmh lnoutput = ({b0}-1/{rho=1}*ln({delta=0.5}*capital^(-1*{rho})+(1-{delta})*labor^(-1*{rho}))),
likelihood(normal({sig2}))
prior({sig2},  igamma(0.01, 0.01))
prior({b0},    normal(0, 100))
prior({delta}, uniform(0, 1))
prior({rho},   gamma(1, 1))
block({b0 delta rho sig2}, split)
rseed(16) saving(ces_mcmc)

Burn-in ...
Simulation ...

Model summary

Likelihood:
lnoutput ~ normal(,{sig2})

Priors:
{sig2} ~ igamma(0.01,0.01)
{b0} ~ normal(0,100)
{delta} ~ uniform(0,1)
{rho} ~ gamma(1,1)

Expression:
expr1 : {b0}-1/{rho=1}*ln({delta=0.5}*capital^(-1*{rho})+(1-{delta})*labor^(-1*{rho}))

Bayesian normal regression                        MCMC iterations  =     12,500
Random-walk Metropolis-Hastings sampling          Burn-in          =      2,500
MCMC sample size =     10,000
Number of obs    =        100
Acceptance rate  =      .4385
Efficiency:  min =     .02283
avg =      .1225
Log marginal-likelihood = -93.731032                           max =      .2461

Equal-tailed
Mean   Std. dev.     MCSE     Median  [95% cred. interval]

b0    3.829453   .1047645     .0059   3.829384   3.633927   4.027828
delta    .4830306   .0636595   .001283   .4820599   .3540333   .6146103
rho    1.838675   .8665057   .057343   1.624436   .8088482   4.053971
sig2    .3098183    .043935   .001009   .3062381    .233141   .4054633

file ces_mcmc.dta saved


The nonlinear specification of bayesmh is similar to that of nl. But in our Bayesian model, we additionally specify prior distributions for model parameters. We use mildly informative priors for the parameters. To improve sampling efficiency, we simulate parameters in separate blocks. We also save the MCMC results in ces_mcmc.dta—this step is required before using bayespredict.

Suppose that we would like to compute the value of log output given the capital and labor values of 1.5 and 2, respectively. First, we add a new observation that will record the new values for capital and labor.

. set obs 101
number of observations (_N) was 100, now 101

. replace capital = 1.5 in 101

. replace labor = 2 in 101


Next, we use bayespredict with the mean option to compute the posterior mean estimate for each observation in the dataset and save them in the new variable pmean. We also specify a random-number seed for reproducibility.

. bayespredict pmean, mean rseed(16)

Computing predictions ...


We list the last 11 observations to compare the observed and predicted values of lnoutput.

. list labor capital lnoutput pmean in 90/101

labor    capital   lnoutput      pmean
90.  .0667528   .8334751   1.360945   1.525867
91.  1.660983   .2120817   2.772991   2.700305
92.  1.761712    5.48467   4.748453   4.717113
93.  .3323506   .2172508   1.994157   2.489211
94.  .2826679    .989325   3.231169   2.891441
95.  1.330708   1.475331   4.571402   4.161445
96.  .4023916   .2614668   2.737468   2.667645
97.  3.801785   .6329353   3.619768   3.786168
98.  .3353206   .2557932   2.292713   2.584241
99.   31.5595    2.72437   5.026271   5.275726
100.   3.80443    .946865   3.363704   4.150184
101.         2        1.5          .   4.358272



The predicted posterior means are similar to the observed values of lnoutput in the existing observations. The predicted value of log of output in the new observation is 4.36.

If desired, we can compute credible intervals to form inference about our predicted value.

. bayespredict cril criu, cri rseed(16)

Computing predictions ...

. list pmean cril criu in 101

pmean       cril       criu
101.  4.358272   3.257501   5.497338



The 95% credible interval for the new observation is [3.26, 5.50].

We now drop the new observation and proceed with further analysis.

. drop in 101
(1 observation deleted)


## Posterior predictive checks

In addition to predicting new outcome values, Bayesian predictions are useful for model checking. These checks, also known as posterior predictive checks, amount to comparing the observed data with the so-called replicated data. Replicated data are data simulated from the fitted model or, more precisely, from the posterior predictive distribution using the same covariate values as were used for estimation. The comparisons may be done graphically using, for instance, histograms or formally using posterior predictive p-values.

### Observed versus replicated data—MCMC replicates

Let's compare the distributions of the observed and replicated data. The full replicated data will contain 10,000 MCMC replicates for lnoutput. Graphical comparison of so many replicates is not feasible. Instead, we can randomly generate a smaller subset of MCMC replicates and evaluate those. Here we will create only three replicates for simplicity.

. bayesreps yrep*, nreps(3) rseed(16)

Computing predictions ...

. list lnoutput yrep1 yrep2 yrep3 pmean in 1/10

lnoutput      yrep1      yrep2      yrep3      pmean
1.  2.933451   3.496267   1.416066    3.85218   3.111261
2.  4.613716   4.794032    3.46249   4.354308   4.484845
3.  1.654005   2.067956   2.135827   1.949141   2.026532
4.  2.025361   2.568239   2.233599   2.780148   2.224722
5.  3.165065   2.979953   2.179758   3.609532   2.894845
6.  1.372219    1.58399    2.10996   2.931538   2.345947
7.   2.92082   4.086907   3.160853   3.570297   3.255964
8.  2.698994   1.730741   1.845746   2.216033   2.281841
9.  1.197918    1.61528   1.039314   1.529847   1.224741
10.  3.097216   2.280563   2.774098   2.799236    2.76704



bayesreps generates three new variables: yrep1, yrep2, and yrep3, which contain MCMC replicates. pmean is essentially the sample mean of all 10,000 MCMC replicates.

We can compare the histograms of the replicated data (in blue) with those of the observed data.

. twoway histogram lnoutput || histogram yrep1, color(navy%25) ||
histogram yrep2, color(navy%25) || histogram yrep3, color(navy%25)
legend(off) title(Observed vs replicated data)


The distributions appear to be similar.

### Simulated outcomes and residuals

If needed, we can also simulate the entire 10,000 × 100 sample of predicted values. We save it in a separate dataset, ces_pred.dta.

. bayespredict {_ysim}, saving(ces_pred) rseed(16)

Computing predictions ...

file ces_pred.dta saved
file ces_pred.ster saved

. describe _index _ysim1_1 _ysim1_2 _ysim1_3 using ces_pred

storage   display    value
variable name   type    format     label      variable label

_index          int     %8.0g                 Iteration number
_ysim1_1        double  %10.0g                Simulated lnoutput, obs. #1
_ysim1_2        double  %10.0g                Simulated lnoutput, obs. #2
_ysim1_3        double  %10.0g                Simulated lnoutput, obs. #3


The dataset contains 10,000 observations and 100 variables, one for each observation from the original dataset. Above, we described only the first three out of 100 variables.

We can use bayesstats summary to compute posterior summaries for, say, the first 10 observations.

. bayesstats summary {_ysim[1/10]} using ces_pred

Posterior summary statistics                       MCMC sample size =    10,000

Equal-tailed
Mean   Std. dev.     MCSE     Median  [95% cred. interval]

_ysim1_1    3.111305    .564188   .005642   3.114486   2.013625   4.230432
_ysim1_2     4.47848   .5647274   .006264   4.487651   3.362394   5.576457
_ysim1_3    2.033675   .5562368   .005692   2.033444   .9360682   3.115028
_ysim1_4    2.234464   .5692011   .005783   2.234235   1.130026   3.361621
_ysim1_5    2.894212   .5655813   .005948   2.894129   1.789959   4.013839
_ysim1_6    2.337328   .5614379    .00588    2.33384   1.226629   3.451852
_ysim1_7     3.25311   .5660545   .005661   3.252063   2.126941   4.372151
_ysim1_8    2.274426   .5638561   .005639   2.280058   1.157936   3.359114
_ysim1_9    1.227687   .5582967   .005649    1.23153   .1239763   2.312214
_ysim1_10    2.766746   .5644516   .005759    2.76776   1.655075   3.871556



The posterior mean estimates are similar to the ones we computed earlier and stored in variable pmean.

bayespredict also provides access to residuals. Below, we use bayesgraph histogram to produce a histogram of the variance of the residuals based on the MCMC sample. We use Mata function variance() to compute the variance of the residuals {_resid} for each MCMC replicate and label the results as resvar.

. bayesgraph histogram (resvar:@variance({_resid})) using ces_pred


The distribution has the mean of roughly 0.3, which is similar to the posterior mean estimate of the parameter {sig2}. And its shape resembles a chi-squared distribution.

### Posterior predictive p-values

We can compare the observed and replicated data more formally. We can use bayesstats ppvalues to compute posterior predictive p-values for various distributional summaries. For instance, we can compare the means, minimums, and maximums of the observed and replicated data.

One of the neat prediction features of Stata's Bayesian suite is the ability to specify functions of simulated values with any Bayesian postestimation command, including bayesstats ppvalues. We have already seen an example of this when we used bayesgraph histogram to plot the histogram for the variance of residuals. Here, we will use Mata functions mean(), min(), and max() to compute posterior predictive p-values for the corresponding summaries of the simulated outcomes.

. bayesstats ppvalues (mean:@mean({_ysim})) (min:@min({_ysim})) (max:@max({_ysim})) using ces_pred

Posterior predictive summary    MCMC sample size =    10,000

T        Mean   Std. dev.  E(T_obs)  P(T>=T_obs)

mean    3.045143   .0787588   3.044554        .5026
min    .5130189   .3401942   1.049675        .0365
max     5.84806   .3703789   5.703145         .626

Note: P(T>=T_obs) close to 0 or 1 indicates lack of fit.


Posterior predictive p-value, labeled as P(T>=T_obs) in the output, is the probability that the test statistic (such as, for instance, the minimum in our example) in the replicated data, T, is as or more extreme as the same statistic in the observed data, T_obs. When this probability is close to 0.5, the replicated and observed data are considered to be in agreement with respect to the test statistic of interest.

In our example, the mean and maximum statistics appear to be in agreement in the observed and replicated data, but not the minimum statistic.

### Compute your own test statistics

Yet another neat feature of Bayesian predictions is that you can create your own functions and apply them to simulated values. You can write these functions as Mata functions, Stata programs, or both.

Let's write a Mata function to compute the skewness of a variable.

.  mata:

mata (type end to exit)

: real scalar skew(real colvector x) {
return (sqrt(length(x))*sum((x:-mean(x)):^3)/(sum((x:-mean(x)):^2)^1.5))
}

: end



And let's use a Stata program to compute a kurtosis of a variable.

. program kurt
1.         version 17
2.         args kurtosis x
3.         quietly summarize x', detail
4.         scalar kurtosis' = r(kurtosis)
5. end


Now, we can use these functions with bayespredict to compute skewness and kurtosis of simulated outcome values and save the results in a new dataset ces_teststat.dta.

. bayespredict (skewness:@skew({_ysim})) (kurtosis:@kurt {_ysim}), saving(ces_teststat) rseed(16)

Computing predictions ...

file ces_teststat.dta saved
file ces_teststat.ster saved


We named our predicted functions as skewness and kurtosis. We can use these names to refer to the functions from other postestimation commands.

One advantage of specifying functions directly with bayespredict instead of within other Bayesian postestimation commands is that we avoid creating a large dataset and save time by not recomputing the results each time we run another command. The disadvantage is that we will not be able to compute any other functions that require access to the full simulated dataset.

We use bayesstats summary to compute posterior summaries of the skewness and kurtosis measures.

. bayesstats summary {skewness} {kurtosis} using ces_teststat

Posterior summary statistics                        MCMC sample size =    10,000

Equal-tailed
Mean   Std. dev.     MCSE     Median  [95% cred. interval]
skewness    .1471358   .1660461    .00166   .1473484  -.1803233   .4741272
kurtosis    2.670391   .3049096   .003386   2.636845   2.163721   3.353066



And we can compute posterior predictive p-values for these measures too.

. bayesstats ppvalues {skewness} {kurtosis} using ces_teststat

Posterior predictive summary    MCMC sample size =    10,000

T        Mean   Std. dev.  E(T_obs)  P(T>=T_obs)

skewness    .1471358   .1660461   .1555046        .4806
kurtosis    2.670391   .3049096   2.252134        .9411

Note: P(T>=T_obs) close to 0 or 1 indicates lack of fit.


The replicated data fit the observed data well with respect to skewness but not kurtosis.

## Tell me more

See all of Stata's Bayesian features.

Read more about Bayesian predictions and Bayesian analysis in the Stata Bayesian Analysis Reference Manual.