Bayesian predictions

Order

Watch video demo

<- See Stata's other features

Highlights

Predict new values or check model fit
Simulate outcome values for all or a subset of observations
Predict functions of simulated outcomes—test statistics and test quantities
Specify your own prediction functions using:

Mata functions or
Stata programs

Produce graphical summaries and more for simulated outcomes and their functions
Generate posterior summaries of simulated values
Generate MCMC replicates
Compute posterior predictive p-values
See more Bayesian analysis features

Bayesian inference focuses on estimation of model parameters. But what if we want to estimate a future outcome value? This is one of the goals of Bayesian predictions.

Bayesian predictions are outcome values simulated from the posterior predictive distribution, which is the distribution of the unobserved (future) data given the observed data. They can be used as optimal predictors in forecasting, optimal classifiers in classification problems, imputations for missing data, and more. They are also important for checking model goodness of fit.

Stata's bayespredict command computes Bayesian predictions. The bayesreps command computes Markov chain Monte Carlo (MCMC) replicates of outcome variables and the bayesstats ppvalues command computes posterior predictive p-values, all of which are based on Bayesian predictions and used for model diagnostic checks.

First, you fit a model. Here is a Bayesian linear regression fit using bayesmh:

. bayesmh y x1 x2, likelihood(normal({var})) prior({y:}, normal(0,100)) prior({var}, igamma(1,2))

To compute Bayesian predictions for outcome y and save them in ysimdata.dta, type

. bayespredict {_ysim}, saving(ysimdata)

These simulated values can be used to perform model diagnostic checks or to make forecasts if you change the values recorded in x1 and x2 or add new values for them.

Bayesian prediction differs from frequentist prediction. Prediction, in a frequentist sense, is a deterministic function of estimated model parameters. For example, in a linear regression, the linear predictor, which is a linear combination of estimated regression coefficients and observed covariates, is used to predict values of continuous outcomes. Bayesian predictions, on the other hand, are simulated outcomes (or functions of them) and are thus stochastic quantities. (Remember to specify a random-number seed for reproducibility when computing Bayesian predictions!)

Unlike classical prediction, which produces a single value for each observation, Bayesian prediction produces an MCMC sample of values for each observation. If you have n observations and your MCMC sample size is T, the result of Bayesian prediction is a T × n dataset. Thus, bayespredict saves Bayesian predictions in a separate file.

Instead of an entire MCMC sample of outcome values, you can use bayesreps to obtain a random subset of MCMC replicates and save them as new variables in your current dataset. These are useful, for instance, for quick visual comparison of the simulated data with the observed data.

. bayesreps yrep*, nreps(10)

You may also be interested in various summaries of MCMC outcome values such as posterior means or medians. You may compute those (and more) and save them as new variables in your current dataset. For instance,

. bayespredict pmean, mean

computes the posterior mean of y across MCMC values for each observation and stores the results in the new variable pmean.

Finally, you can compute functions of Bayesian predictions such as test statistics and test quantities (test statistics that also depend on parameters). These are also useful for model checks, and we can use bayesstats ppvalues to compute posterior predictive p-values for them such as for the minimum and maximum statistics below:

. bayesstats ppvalues (ymin: @min({_ysim})) (ymax: @max({_ysim})) using ysimdata

The above commands are available after Bayesian models fit using bayesmh.

There are more features, and we demonstrate them in the examples below.

Let's see it work

Predict new outcome values—posterior means, credible intervals
Posterior predictive checks

Observed versus replicated data—MCMC replicates
Simulated outcomes and residuals
Posterior predictive p-values
Compute your own test statistics

Predict new outcome values—posterior means, credible interval

Let's fit the CES production function from Example 1 of [R] nl using bayesmh. This function models the log of output from firms as a nonlinear function of their capital and labor usage.

. webuse production

. bayesmh lnoutput = ({b0}-1/{rho=1}*ln({delta=0.5}*capital^(-1*{rho})+(1-
     {delta})*labor^(-1*{rho}))), likelihood(normal({sig2}))
     prior({sig2},  igamma(0.01, 0.01))
     prior({b0},    normal(0, 100))
     prior({delta}, uniform(0, 1))
     prior({rho},   gamma(1, 1))
     block({b0 delta rho sig2}, split)
     rseed(16) saving(ces_mcmc)

Burn-in ...
Simulation ...

Model summary
                                                                               
Likelihood:
  lnoutput ~ normal(,{sig2})

Priors:
   {sig2} ~ igamma(0.01,0.01)
     {b0} ~ normal(0,100)
  {delta} ~ uniform(0,1)
    {rho} ~ gamma(1,1)

Expression:
  expr1 : {b0}-1/{rho=1}*ln({delta=0.5}*capital^(-1*{rho})+(1-{delta})*labor^(
          -1*{rho}))
                                                                               

Bayesian normal regression                        MCMC iterations  =     12,500
Random-walk Metropolis-Hastings sampling          Burn-in          =      2,500
                                                  MCMC sample size =     10,000
                                                  Number of obs    =        100
                                                  Acceptance rate  =      .4385
                                                  Efficiency:  min =     .02283
                                                               avg =      .1225
Log marginal-likelihood = -93.731032                           max =      .2461



                                                               Equal-tailed    
                     Mean   Std. dev.     MCSE     Median  [95% cred. interval]

          b0    3.829453   .1047645     .0059   3.829384   3.633927   4.027828
       delta    .4830306   .0636595   .001283   .4820599   .3540333   .6146103
         rho    1.838675   .8665057   .057343   1.624436   .8088482   4.053971
        sig2    .3098183    .043935   .001009   .3062381    .233141   .4054633

file ces_mcmc.dta saved

The nonlinear specification of bayesmh is similar to that of nl. But in our Bayesian model, we additionally specify prior distributions for model parameters. We use mildly informative priors for the parameters. To improve sampling efficiency, we simulate parameters in separate blocks. We also save the MCMC results in ces_mcmc.dta—this step is required before using bayespredict.

Suppose that we would like to compute the value of log output given the capital and labor values of 1.5 and 2, respectively. First, we add a new observation that will record the new values for capital and labor.

. set obs 101
number of observations (_N) was 100, now 101

. replace capital = 1.5 in 101
(1 real change made)

. replace labor = 2 in 101
(1 real change made)

Next, we use bayespredict with the mean option to compute the posterior mean estimate for each observation in the dataset and save them in the new variable pmean. We also specify a random-number seed for reproducibility.

. bayespredict pmean, mean rseed(16)

Computing predictions ...

We list the last 11 observations to compare the observed and predicted values of lnoutput.

. list labor capital lnoutput pmean in 90/101


     labor    capital   lnoutput      pmean 
 90.  .0667528   .8334751   1.360945   1.525867 
 91.  1.660983   .2120817   2.772991   2.700305 
 92.  1.761712    5.48467   4.748453   4.717113 
 93.  .3323506   .2172508   1.994157   2.489211 
 94.  .2826679    .989325   3.231169   2.891441 
 95.  1.330708   1.475331   4.571402   4.161445 
 96.  .4023916   .2614668   2.737468   2.667645 
 97.  3.801785   .6329353   3.619768   3.786168 
 98.  .3353206   .2557932   2.292713   2.584241 
 99.   31.5595    2.72437   5.026271   5.275726 
100.   3.80443    .946865   3.363704   4.150184 
101.         2        1.5          .   4.358272

The predicted posterior means are similar to the observed values of lnoutput in the existing observations. The predicted value of log of output in the new observation is 4.36.

If desired, we can compute credible intervals to form inference about our predicted value.

. bayespredict cril criu, cri rseed(16)

Computing predictions ...

. list pmean cril criu in 101

     pmean       cril       criu 
101.  4.358272   3.257501   5.497338

The 95% credible interval for the new observation is [3.26, 5.50].

We now drop the new observation and proceed with further analysis.

. drop in 101
(1 observation deleted)

Posterior predictive checks

In addition to predicting new outcome values, Bayesian predictions are useful for model checking. These checks, also known as posterior predictive checks, amount to comparing the observed data with the so-called replicated data. Replicated data are data simulated from the fitted model or, more precisely, from the posterior predictive distribution using the same covariate values as were used for estimation. The comparisons may be done graphically using, for instance, histograms or formally using posterior predictive p-values.

Observed versus replicated data—MCMC replicates
Simulated outcomes and residuals
Posterior predictive p-values
Compute your own test statistics

Observed versus replicated data—MCMC replicates

Let's compare the distributions of the observed and replicated data. The full replicated data will contain 10,000 MCMC replicates for lnoutput. Graphical comparison of so many replicates is not feasible. Instead, we can randomly generate a smaller subset of MCMC replicates and evaluate those. Here we will create only three replicates for simplicity.

. bayesreps yrep*, nreps(3) rseed(16)

Computing predictions ...

. list lnoutput yrep1 yrep2 yrep3 pmean in 1/10


  lnoutput      yrep1      yrep2      yrep3      pmean 
  1.  2.933451   3.496267   1.416066    3.85218   3.111261 
  2.  4.613716   4.794032    3.46249   4.354308   4.484845 
  3.  1.654005   2.067956   2.135827   1.949141   2.026532 
  4.  2.025361   2.568239   2.233599   2.780148   2.224722 
  5.  3.165065   2.979953   2.179758   3.609532   2.894845 
  6.  1.372219    1.58399    2.10996   2.931538   2.345947 
  7.   2.92082   4.086907   3.160853   3.570297   3.255964 
  8.  2.698994   1.730741   1.845746   2.216033   2.281841 
  9.  1.197918    1.61528   1.039314   1.529847   1.224741 
 10.  3.097216   2.280563   2.774098   2.799236    2.76704

bayesreps generates three new variables: yrep1, yrep2, and yrep3, which contain MCMC replicates. pmean is essentially the sample mean of all 10,000 MCMC replicates.

We can compare the histograms of the replicated data (in blue) with those of the observed data.

. twoway histogram lnoutput || histogram yrep1, color(navy%25) ||
     histogram yrep2, color(navy%25) || histogram yrep3, color(navy%25)
     legend(off) title(Observed vs replicated data)

The distributions appear to be similar.

Simulated outcomes and residuals

If needed, we can also simulate the entire 10,000 × 100 sample of predicted values. We save it in a separate dataset, ces_pred.dta.

. bayespredict {_ysim}, saving(ces_pred) rseed(16)

Computing predictions ...

file ces_pred.dta saved
file ces_pred.ster saved

. describe _index _ysim1_1 _ysim1_2 _ysim1_3 using ces_pred

              storage   display    value
variable name   type    format     label      variable label



_index          int     %8.0g                 Iteration number
_ysim1_1        double  %10.0g                Simulated lnoutput, obs. #1
_ysim1_2        double  %10.0g                Simulated lnoutput, obs. #2
_ysim1_3        double  %10.0g                Simulated lnoutput, obs. #3

The dataset contains 10,000 observations and 100 variables, one for each observation from the original dataset. Above, we described only the first three out of 100 variables.

We can use bayesstats summary to compute posterior summaries for, say, the first 10 observations.

. bayesstats summary {_ysim[1/10]} using ces_pred

Posterior summary statistics                       MCMC sample size =    10,000



                                                              Equal-tailed    
                    Mean   Std. dev.     MCSE     Median  [95% cred. interval]

    _ysim1_1    3.111305    .564188   .005642   3.114486   2.013625   4.230432
    _ysim1_2     4.47848   .5647274   .006264   4.487651   3.362394   5.576457
    _ysim1_3    2.033675   .5562368   .005692   2.033444   .9360682   3.115028
    _ysim1_4    2.234464   .5692011   .005783   2.234235   1.130026   3.361621
    _ysim1_5    2.894212   .5655813   .005948   2.894129   1.789959   4.013839
    _ysim1_6    2.337328   .5614379    .00588    2.33384   1.226629   3.451852
    _ysim1_7     3.25311   .5660545   .005661   3.252063   2.126941   4.372151
    _ysim1_8    2.274426   .5638561   .005639   2.280058   1.157936   3.359114
    _ysim1_9    1.227687   .5582967   .005649    1.23153   .1239763   2.312214
   _ysim1_10    2.766746   .5644516   .005759    2.76776   1.655075   3.871556

The posterior mean estimates are similar to the ones we computed earlier and stored in variable pmean.

bayespredict also provides access to residuals. Below, we use bayesgraph histogram to produce a histogram of the variance of the residuals based on the MCMC sample. We use Mata function variance() to compute the variance of the residuals {_resid} for each MCMC replicate and label the results as resvar.

. bayesgraph histogram (resvar:@variance({_resid})) using ces_pred

The distribution has the mean of roughly 0.3, which is similar to the posterior mean estimate of the parameter {sig2}. And its shape resembles a chi-squared distribution.

Posterior predictive p-values

We can compare the observed and replicated data more formally. We can use bayesstats ppvalues to compute posterior predictive p-values for various distributional summaries. For instance, we can compare the means, minimums, and maximums of the observed and replicated data.

One of the neat prediction features of Stata's Bayesian suite is the ability to specify functions of simulated values with any Bayesian postestimation command, including bayesstats ppvalues. We have already seen an example of this when we used bayesgraph histogram to plot the histogram for the variance of residuals. Here, we will use Mata functions mean(), min(), and max() to compute posterior predictive p-values for the corresponding summaries of the simulated outcomes.

. bayesstats ppvalues (mean:@mean({_ysim})) (min:@min({_ysim}))
     (max:@max({_ysim})) using ces_pred

Posterior predictive summary    MCMC sample size =    10,000



           T        Mean   Std. dev.  E(T_obs)  P(T>=T_obs)

        mean    3.045143   .0787588   3.044554        .5026
         min    .5130189   .3401942   1.049675        .0365
         max     5.84806   .3703789   5.703145         .626


Note: P(T>=T_obs) close to 0 or 1 indicates lack of fit.

Posterior predictive p-value, labeled as P(T>=T_obs) in the output, is the probability that the test statistic (such as, for instance, the minimum in our example) in the replicated data, T, is as or more extreme as the same statistic in the observed data, T_obs. When this probability is close to 0.5, the replicated and observed data are considered to be in agreement with respect to the test statistic of interest.

In our example, the mean and maximum statistics appear to be in agreement in the observed and replicated data, but not the minimum statistic.

Compute your own test statistics

Yet another neat feature of Bayesian predictions is that you can create your own functions and apply them to simulated values. You can write these functions as Mata functions, Stata programs, or both.

Let's write a Mata function to compute the skewness of a variable.

.  mata:

                                                      mata (type end to exit)        

: real scalar skew(real colvector x) {
          return (sqrt(length(x))*sum((x:-mean(x)):^3)/(sum((x:-mean(x)):^2)^1.5))
  }

: end

And let's use a Stata program to compute a kurtosis of a variable.

. program kurt
  1.         version 18
  2.         args kurtosis x
  3.         quietly summarize `x', detail
  4.         scalar `kurtosis' = r(kurtosis)
  5. end

Now, we can use these functions with bayespredict to compute skewness and kurtosis of simulated outcome values and save the results in a new dataset ces_teststat.dta.

. bayespredict (skewness:@skew({_ysim})) (kurtosis:@kurt {_ysim}), saving(ces_teststat) rseed(16)

Computing predictions ...

file ces_teststat.dta saved
file ces_teststat.ster saved

We named our predicted functions as skewness and kurtosis. We can use these names to refer to the functions from other postestimation commands.

One advantage of specifying functions directly with bayespredict instead of within other Bayesian postestimation commands is that we avoid creating a large dataset and save time by not recomputing the results each time we run another command. The disadvantage is that we will not be able to compute any other functions that require access to the full simulated dataset.

We use bayesstats summary to compute posterior summaries of the skewness and kurtosis measures.

. bayesstats summary {skewness} {kurtosis} using ces_teststat

Posterior summary statistics                        MCMC sample size =    10,000



                                                               Equal-tailed    
                     Mean   Std. dev.     MCSE     Median  [95% cred. interval]

    skewness    .1471358   .1660461    .00166   .1473484  -.1803233   .4741272
    kurtosis    2.670391   .3049096   .003386   2.636845   2.163721   3.353066

And we can compute posterior predictive p-values for these measures too.

. bayesstats ppvalues {skewness} {kurtosis} using ces_teststat

Posterior predictive summary    MCMC sample size =    10,000



           T        Mean   Std. dev.  E(T_obs)  P(T>=T_obs)

    skewness    .1471358   .1660461   .1555046        .4806
    kurtosis    2.670391   .3049096   2.252134        .9411


Note: P(T>=T_obs) close to 0 or 1 indicates lack of fit.

The replicated data fit the observed data well with respect to skewness but not kurtosis.

Tell me more

See all of Stata's Bayesian features.

Learn about Bayesian lasso modeling and How to run multiple chains in parallel.

Read more about Bayesian predictions and Bayesian analysis in the Bayesian Analysis Reference Manual.

Products

New in Stata 19

Why Stata

All features

Disciplines

Stata/MP

StataNow

Order Stata

Purchase

Order Stata

Bookstore

Stata Press

Stata Journal

Gift Shop

Learn

Free webinars

NetCourses

Classroom and web training

Organizational training

Video tutorials

Third-party courses

Web resources

Teaching with Stata

Support

Training

Video tutorials

FAQs

Statalist: The Stata Forum

Resources

Technical support

Customer service

Alerts

Company

News and events

Customer service

Careers

We use cookies

We use cookies to ensure that we give you the best experience on our website—to enhance site navigation, to analyze usage, and to assist in our marketing efforts. By continuing to use our site, you consent to the storing of cookies on your device and agree to delivery of content, including web fonts and JavaScript, from third party web services.

Cookie Settings

Privacy policy

Last updated: 16 November 2022

StataCorp LLC (StataCorp) strives to provide our users with exceptional products and services. To do so, we must collect personal information from you. This information is necessary to conduct business with our existing and potential customers. We collect and use this information only where we may legally do so. This policy explains what personal information we collect, how we use it, and what rights you have to that information.

Required cookies

Advertising cookies

Required cookies

These cookies are essential for our website to function and do not store any personally identifiable information. These cookies cannot be disabled.
Advertising and performance cookies

This website uses cookies to provide you with a better user experience. A cookie is a small piece of data our website stores on a site visitor's hard drive and accesses each time you visit so we can improve your access to our site, better understand how you use our site, and serve you content that may be of interest to you. For instance, we store a cookie when you log in to our shopping cart so that we can maintain your shopping cart should you not complete checkout. These cookies do not directly store your personal information, but they do support the ability to uniquely identify your internet browser and device.

Please note: Clearing your browser cookies at any time will undo preferences saved here. The option selected here will apply only to the device you are currently using.

Accept Cookies


		Equal-tailed
		Mean Std. dev. MCSE Median [95% cred. interval]

b0		3.829453 .1047645 .0059 3.829384 3.633927 4.027828
delta		.4830306 .0636595 .001283 .4820599 .3540333 .6146103
rho		1.838675 .8665057 .057343 1.624436 .8088482 4.053971
sig2		.3098183 .043935 .001009 .3062381 .233141 .4054633

	labor capital lnoutput pmean
90.	.0667528 .8334751 1.360945 1.525867
91.	1.660983 .2120817 2.772991 2.700305
92.	1.761712 5.48467 4.748453 4.717113
93.	.3323506 .2172508 1.994157 2.489211
94.	.2826679 .989325 3.231169 2.891441
95.	1.330708 1.475331 4.571402 4.161445
96.	.4023916 .2614668 2.737468 2.667645
97.	3.801785 .6329353 3.619768 3.786168
98.	.3353206 .2557932 2.292713 2.584241
99.	31.5595 2.72437 5.026271 5.275726
100.	3.80443 .946865 3.363704 4.150184
101.	2 1.5 . 4.358272

	lnoutput yrep1 yrep2 yrep3 pmean
1.	2.933451 3.496267 1.416066 3.85218 3.111261
2.	4.613716 4.794032 3.46249 4.354308 4.484845
3.	1.654005 2.067956 2.135827 1.949141 2.026532
4.	2.025361 2.568239 2.233599 2.780148 2.224722
5.	3.165065 2.979953 2.179758 3.609532 2.894845
6.	1.372219 1.58399 2.10996 2.931538 2.345947
7.	2.92082 4.086907 3.160853 3.570297 3.255964
8.	2.698994 1.730741 1.845746 2.216033 2.281841
9.	1.197918 1.61528 1.039314 1.529847 1.224741
10.	3.097216 2.280563 2.774098 2.799236 2.76704


		Equal-tailed
		Mean Std. dev. MCSE Median [95% cred. interval]

_ysim1_1		3.111305 .564188 .005642 3.114486 2.013625 4.230432
_ysim1_2		4.47848 .5647274 .006264 4.487651 3.362394 5.576457
_ysim1_3		2.033675 .5562368 .005692 2.033444 .9360682 3.115028
_ysim1_4		2.234464 .5692011 .005783 2.234235 1.130026 3.361621
_ysim1_5		2.894212 .5655813 .005948 2.894129 1.789959 4.013839
_ysim1_6		2.337328 .5614379 .00588 2.33384 1.226629 3.451852
_ysim1_7		3.25311 .5660545 .005661 3.252063 2.126941 4.372151
_ysim1_8		2.274426 .5638561 .005639 2.280058 1.157936 3.359114
_ysim1_9		1.227687 .5582967 .005649 1.23153 .1239763 2.312214
_ysim1_10		2.766746 .5644516 .005759 2.76776 1.655075 3.871556


T		Mean Std. dev. E(T_obs) P(T>=T_obs)

mean		3.045143 .0787588 3.044554 .5026
min		.5130189 .3401942 1.049675 .0365
max		5.84806 .3703789 5.703145 .626


		Equal-tailed
		Mean Std. dev. MCSE Median [95% cred. interval]

skewness		.1471358 .1660461 .00166 .1473484 -.1803233 .4741272
kurtosis		2.670391 .3049096 .003386 2.636845 2.163721 3.353066


T		Mean Std. dev. E(T_obs) P(T>=T_obs)

skewness		.1471358 .1660461 .1555046 .4806
kurtosis		2.670391 .3049096 2.252134 .9411