Predict new values or check model fit

Simulate outcome values for all or a subset of observations

Predict functions of simulated outcomes—test statistics and test quantities

Specify your own prediction functions using:

Mata functions or

Stata programs

Produce graphical summaries and more for simulated outcomes and their functions

Generate posterior summaries of simulated values

Generate MCMC replicates

Compute posterior predictive p-values

See more Bayesian analysis features

Bayesian inference focuses on estimation of model parameters. But what if we want to estimate a future outcome value? This is one of the goals of Bayesian predictions.

Bayesian predictions are outcome values simulated from the posterior predictive distribution, which is the distribution of the unobserved (future) data given the observed data. They can be used as optimal predictors in forecasting, optimal classifiers in classification problems, imputations for missing data, and more. They are also important for checking model goodness of fit.

Stata's **bayespredict**
command computes Bayesian predictions. The
**bayesreps** command computes
Markov chain Monte Carlo (MCMC) replicates of outcome variables and the
**bayesstats ppvalues**
command computes posterior predictive *p*-values, all of
which are based on Bayesian predictions and used for model diagnostic checks.

First, you fit a model. Here is a Bayesian linear regression fit using
**bayesmh**:

.bayesmh y x1 x2, likelihood(normal({var})) prior({y:}, normal(0,100)) prior({var}, igamma(1,2))

To compute Bayesian predictions for outcome **y** and save them in
**ysimdata.dta**, type

.bayespredict {_ysim}, saving(ysimdata)

These simulated values can be used to perform model diagnostic checks
or to make forecasts if you change the values recorded in **x1** and
**x2** or add new values for them.

Bayesian prediction differs from frequentist prediction. Prediction, in a frequentist sense, is a deterministic function of estimated model parameters. For example, in a linear regression, the linear predictor, which is a linear combination of estimated regression coefficients and observed covariates, is used to predict values of continuous outcomes. Bayesian predictions, on the other hand, are simulated outcomes (or functions of them) and are thus stochastic quantities. (Remember to specify a random-number seed for reproducibility when computing Bayesian predictions!)

Unlike classical prediction, which produces a single value for each
observation, Bayesian prediction produces an MCMC sample of values for each
observation. If you have *n* observations and your MCMC sample size is
*T*, the result of Bayesian prediction is a *T* × *n*
dataset. Thus, **bayespredict** saves Bayesian predictions in a separate
file.

Instead of an entire MCMC sample of outcome values, you can use
**bayesreps** to obtain a random subset of MCMC replicates and save them as
new variables in your current dataset. These are useful, for instance, for
quick visual comparison of the simulated data with the observed data.

.bayesreps yrep*, nreps(10)

You may also be interested in various summaries of MCMC outcome values such as posterior means or medians. You may compute those (and more) and save them as new variables in your current dataset. For instance,

.bayespredict pmean, mean

computes the posterior mean of **y** across MCMC values for each
observation and stores the results in the new variable **pmean**.

Finally, you can compute functions of Bayesian predictions such as test
statistics and test quantities (test statistics that also depend on
parameters). These are also useful for model checks, and we can use
**bayesstats ppvalues** to compute posterior predictive *p*-values
for them such as for the minimum and maximum statistics below:

.bayesstats ppvalues (ymin: @min({_ysim})) (ymax: @max({_ysim})) using ysimdata

The above commands are available after Bayesian models fit using **bayesmh**.

There are more features, and we demonstrate them in the examples below.

Let's fit the CES production function from * Example 1* of **[R] nl** using **bayesmh**. This function models the log of output from firms as a nonlinear function of their capital and labor usage.

.webuse production.bayesmh lnoutput = ({b0}-1/{rho=1}*ln({delta=0.5}*capital^(-1*{rho})+(1- {delta})*labor^(-1*{rho}))), likelihood(normal({sig2})) prior({sig2}, igamma(0.01, 0.01)) prior({b0}, normal(0, 100)) prior({delta}, uniform(0, 1)) prior({rho}, gamma(1, 1)) block({b0 delta rho sig2}, split) rseed(16) saving(ces_mcmc)Burn-in ... Simulation ... Model summary

Equal-tailed | ||||

Mean Std. dev. MCSE Median [95% cred. interval] | ||||

b0 | 3.829453 .1047645 .0059 3.829384 3.633927 4.027828 | |||

delta | .4830306 .0636595 .001283 .4820599 .3540333 .6146103 | |||

rho | 1.838675 .8665057 .057343 1.624436 .8088482 4.053971 | |||

sig2 | .3098183 .043935 .001009 .3062381 .233141 .4054633 | |||

The nonlinear specification of **bayesmh** is similar to that of **nl**.
But in our Bayesian model, we
additionally specify prior distributions for model parameters. We use mildly
informative priors for the parameters. To improve sampling efficiency, we
simulate parameters in separate blocks. We also save the MCMC results in
**ces_mcmc.dta**—this step is required before using
**bayespredict**.

Suppose that we would like to compute the value of log output given the
capital and labor values of 1.5 and 2, respectively. First, we add a new
observation that will record the new values for **capital** and
**labor**.

.set obs 101number of observations (_N) was 100, now 101 .replace capital = 1.5 in 101(1 real change made) .replace labor = 2 in 101(1 real change made)

Next, we use **bayespredict** with the **mean** option to compute the
posterior mean estimate for each observation in the dataset and save them in
the new variable **pmean**. We also specify a random-number seed for
reproducibility.

.bayespredict pmean, mean rseed(16)Computing predictions ...

We list the last 11 observations to compare the observed and predicted values
of **lnoutput**.

.list labor capital lnoutput pmean in 90/101

labor capital lnoutput pmean | |

90. | .0667528 .8334751 1.360945 1.525867 |

91. | 1.660983 .2120817 2.772991 2.700305 |

92. | 1.761712 5.48467 4.748453 4.717113 |

93. | .3323506 .2172508 1.994157 2.489211 |

94. | .2826679 .989325 3.231169 2.891441 |

95. | 1.330708 1.475331 4.571402 4.161445 |

96. | .4023916 .2614668 2.737468 2.667645 |

97. | 3.801785 .6329353 3.619768 3.786168 |

98. | .3353206 .2557932 2.292713 2.584241 |

99. | 31.5595 2.72437 5.026271 5.275726 |

100. | 3.80443 .946865 3.363704 4.150184 |

101. | 2 1.5 . 4.358272 |

The predicted posterior means are similar to the observed values of
**lnoutput** in the existing observations. The predicted value of
log of output in the new observation is 4.36.

If desired, we can compute credible intervals to form inference about our predicted value.

.bayespredict cril criu, cri rseed(16)Computing predictions ... .list pmean cril criu in 101

pmean cril criu | |

101. | 4.358272 3.257501 5.497338 |

The 95% credible interval for the new observation is [3.26, 5.50].

We now drop the new observation and proceed with further analysis.

.drop in 101(1 observation deleted)

In addition to predicting new outcome values, Bayesian predictions are useful
for model checking. These checks, also known as posterior predictive checks,
amount to comparing the observed data with the so-called replicated data.
Replicated data are data simulated from the fitted model or, more precisely,
from the posterior predictive distribution using the same covariate values as
were used for estimation. The comparisons may be done graphically using, for
instance, histograms or formally using posterior predictive *p*-values.

Let's compare the distributions of the observed and replicated data.
The full replicated data will contain 10,000 MCMC replicates for
**lnoutput**. Graphical comparison of so many replicates is not feasible.
Instead, we can randomly generate a smaller subset of MCMC replicates and
evaluate those. Here we will create only three replicates for simplicity.

.bayesreps yrep*, nreps(3) rseed(16)Computing predictions ... .list lnoutput yrep1 yrep2 yrep3 pmean in 1/10

lnoutput yrep1 yrep2 yrep3 pmean | |

1. | 2.933451 3.496267 1.416066 3.85218 3.111261 |

2. | 4.613716 4.794032 3.46249 4.354308 4.484845 |

3. | 1.654005 2.067956 2.135827 1.949141 2.026532 |

4. | 2.025361 2.568239 2.233599 2.780148 2.224722 |

5. | 3.165065 2.979953 2.179758 3.609532 2.894845 |

6. | 1.372219 1.58399 2.10996 2.931538 2.345947 |

7. | 2.92082 4.086907 3.160853 3.570297 3.255964 |

8. | 2.698994 1.730741 1.845746 2.216033 2.281841 |

9. | 1.197918 1.61528 1.039314 1.529847 1.224741 |

10. | 3.097216 2.280563 2.774098 2.799236 2.76704 |

**bayesreps** generates three new variables: **yrep1**, **yrep2**,
and **yrep3**, which contain MCMC replicates. **pmean** is essentially
the sample mean of all 10,000 MCMC replicates.

We can compare the histograms of the replicated data (in blue) with those of the observed data.

.twoway histogram lnoutput || histogram yrep1, color(navy%25) || histogram yrep2, color(navy%25) || histogram yrep3, color(navy%25) legend(off) title(Observed vs replicated data)

The distributions appear to be similar.

If needed, we can also simulate the entire 10,000 × 100 sample of
predicted values. We save it in a separate dataset, **ces_pred.dta**.

.bayespredict {_ysim}, saving(ces_pred) rseed(16)Computing predictions ... file ces_pred.dta saved file ces_pred.ster saved .describe _index _ysim1_1 _ysim1_2 _ysim1_3 using ces_predstorage display value variable name type format label variable label

The dataset contains 10,000 observations and 100 variables, one for each observation from the original dataset. Above, we described only the first three out of 100 variables.

We can use **bayesstats summary** to compute posterior summaries for, say, the
first 10 observations.

.bayesstats summary {_ysim[1/10]} using ces_predPosterior summary statistics MCMC sample size = 10,000

Equal-tailed | |||

Mean Std. dev. MCSE Median [95% cred. interval] | |||

_ysim1_1 | 3.111305 .564188 .005642 3.114486 2.013625 4.230432 | ||

_ysim1_2 | 4.47848 .5647274 .006264 4.487651 3.362394 5.576457 | ||

_ysim1_3 | 2.033675 .5562368 .005692 2.033444 .9360682 3.115028 | ||

_ysim1_4 | 2.234464 .5692011 .005783 2.234235 1.130026 3.361621 | ||

_ysim1_5 | 2.894212 .5655813 .005948 2.894129 1.789959 4.013839 | ||

_ysim1_6 | 2.337328 .5614379 .00588 2.33384 1.226629 3.451852 | ||

_ysim1_7 | 3.25311 .5660545 .005661 3.252063 2.126941 4.372151 | ||

_ysim1_8 | 2.274426 .5638561 .005639 2.280058 1.157936 3.359114 | ||

_ysim1_9 | 1.227687 .5582967 .005649 1.23153 .1239763 2.312214 | ||

_ysim1_10 | 2.766746 .5644516 .005759 2.76776 1.655075 3.871556 | ||

The posterior mean estimates are similar to the ones we computed earlier
and stored in variable **pmean**.

**bayespredict** also provides access to residuals. Below, we use
**bayesgraph histogram** to produce a histogram of the variance of the
residuals based on the MCMC sample. We use Mata function **variance()**
to compute the variance of the residuals **{_resid}** for each MCMC
replicate and label the results as **resvar**.

.bayesgraph histogram (resvar:@variance({_resid})) using ces_pred

The distribution has the mean of roughly 0.3, which is similar to the
posterior mean estimate of the parameter **{sig2}**. And its shape resembles
a chi-squared distribution.

We can compare the observed and replicated data more formally. We can use
**bayesstats ppvalues** to compute posterior predictive *p*-values
for various distributional summaries. For instance, we can compare
the means, minimums, and maximums of the observed and replicated data.

One of the neat prediction features of Stata's Bayesian suite is the ability
to specify functions of simulated values with any Bayesian postestimation command,
including **bayesstats ppvalues**. We have already seen an example of
this when we used **bayesgraph histogram** to plot the histogram for the
variance of residuals. Here, we will use Mata functions **mean()**,
**min()**, and **max()** to compute posterior predictive *p*-values
for the corresponding summaries of the simulated outcomes.

.bayesstats ppvalues (mean:@mean({_ysim})) (min:@min({_ysim})) (max:@max({_ysim})) using ces_predPosterior predictive summary MCMC sample size = 10,000

T | Mean Std. dev. E(T_obs) P(T>=T_obs) | ||

mean | 3.045143 .0787588 3.044554 .5026 | ||

min | .5130189 .3401942 1.049675 .0365 | ||

max | 5.84806 .3703789 5.703145 .626 | ||

Posterior predictive *p*-value, labeled as **P(T>=T_obs)** in the
output, is the probability that the test statistic (such as, for instance, the minimum in our
example) in the replicated data, **T**, is as or more extreme as the same
statistic in the observed data, **T_obs**. When this probability is close
to 0.5, the replicated and observed data are considered to be in agreement
with respect to the test statistic of interest.

In our example, the mean and maximum statistics appear to be in agreement in the observed and replicated data, but not the minimum statistic.

Yet another neat feature of Bayesian predictions is that you can create your own functions and apply them to simulated values. You can write these functions as Mata functions, Stata programs, or both.

Let's write a Mata function to compute the skewness of a variable.

.mata:

mata (type end to exit) |

And let's use a Stata program to compute a kurtosis of a variable.

.program kurt1.version 182.args kurtosis x3.quietly summarize `x', detail4.scalar `kurtosis' = r(kurtosis)5.end

Now, we can use these functions with **bayespredict** to compute skewness
and kurtosis of simulated outcome values and save the results in a new dataset
**ces_teststat.dta**.

.bayespredict (skewness:@skew({_ysim})) (kurtosis:@kurt {_ysim}), saving(ces_teststat) rseed(16)Computing predictions ... file ces_teststat.dta saved file ces_teststat.ster saved

We named our predicted functions as **skewness** and **kurtosis**.
We can use these names to refer to the functions from other postestimation
commands.

One advantage of specifying functions directly with **bayespredict**
instead of within other Bayesian postestimation commands is that we avoid
creating a large dataset and save time by not recomputing the results each
time we run another command. The disadvantage is that we will not be able to
compute any other functions that require access to the full simulated dataset.

We use **bayesstats summary** to compute posterior summaries of the skewness
and kurtosis measures.

.bayesstats summary {skewness} {kurtosis} using ces_teststatPosterior summary statistics MCMC sample size = 10,000

Equal-tailed | |||

Mean Std. dev. MCSE Median [95% cred. interval] | |||

skewness | .1471358 .1660461 .00166 .1473484 -.1803233 .4741272 | ||

kurtosis | 2.670391 .3049096 .003386 2.636845 2.163721 3.353066 | ||

And we can compute posterior predictive *p*-values for these measures
too.

.bayesstats ppvalues {skewness} {kurtosis} using ces_teststatPosterior predictive summary MCMC sample size = 10,000

T | Mean Std. dev. E(T_obs) P(T>=T_obs) | ||

skewness | .1471358 .1660461 .1555046 .4806 | ||

kurtosis | 2.670391 .3049096 2.252134 .9411 | ||

The replicated data fit the observed data well with respect to skewness but not kurtosis.

See all of Stata's Bayesian features.

Learn about Bayesian lasso modeling and How to run multiple chains in parallel.

Read more about Bayesian predictions and
Bayesian analysis in the
*Bayesian Analysis Reference Manual*.