Home  /  Products  /  Features  /  Bayesian regression models using the bayes prefix

<-  See Stata's other features

Highlights

Fitting Bayesian regression models can be just as intuitive as performing Bayesian inference—introducing the bayes prefix in Stata. The bayes prefix combines Bayesian features with Stata's intuitive and elegant specification of regression models. It lets you fit Bayesian regression models more easily and fit more models.

You fit linear regression by using

. regress y x1 x2

You can fit Bayesian linear regression by simply using

. bayes: regress y x1 x2

You can also fit a Bayesian survival model by simply using

. bayes: streg x1 x2, distribution(weibull)

You can use the bayes prefix with many more regression models, including logistic, ordered probit, multinomial logistic, Poisson, generalized linear, conditional logistic, zero-inflated, sample-selection, and more. See the full list of supported estimation commands. Multilevel models are among the supported models too! See Bayesian multilevel models for details.

All of Stata's Bayesian features are supported by the bayes prefix. You can select from many prior distributions for model parameters or use default priors or even define your own priors. You can use the default adaptive Metropolis–Hastings sampling, or Gibbs sampling, or a combination of the two sampling methods. And you can use any other feature included in the bayesmh command ([BAYES] bayesmh).

After estimation, you can use Stata's standard Bayesian postestimation tools such as

Let's see it work

Let's now see the bayes prefix in action.

Linear regression

Estimation

Consider data on math scores of pupils in the third and fifth years from different schools in Inner London (Mortimore et al. 1988). We want to fit a linear regression of five-year math scores (math5) on three-year math scores (math3). Here we ignore potential school effects; see Bayesian two-level models for analyses that account for them.

We fit a linear regression by typing

. regress math5 math3

Source SS df MS Number of obs = 887
F(1, 885) = 341.40
Model 10960.2737 1 10960.2737 Prob > F = 0.0000
Residual 28411.6181 885 32.1035233 R-squared = 0.2784
Adj R-squared = 0.2776
Total 39371.8918 886 44.4378011 Root MSE = 5.666
math5 Coefficient Std. err. t P>|t| [95% conf. interval]
math3 .6081306 .0329126 18.48 0.000 .5435347 .6727265
_cons 30.34656 .1906157 159.20 0.000 29.97245 30.72067

To fit a Bayesian linear regression, we simply prefix the above regress command with bayes:.

. bayes: regress math5 math3

Burn-in ...
Simulation ...

Model summary
Likelihood:
math5 ~ regress(xb_math5,{sigma2})
Priors:
{math5:math3 _cons} ~ normal(0,10000) (1)
{sigma2} ~ igamma(.01,.01)
(1) Parameters are elements of the linear form xb_math5.
Bayesian linear regression MCMC iterations = 12,500 Random-walk Metropolis—Hastings sampling Burn-in = 2,500 MCMC sample size = 10,000 Number of obs = 887 Acceptance rate = .3312 Efficiency: min = .1099 avg = .1529 Log marginal-likelihood = -2817.2335 max = .2356
Equal-tailed
Mean Std. dev. MCSE Median [95% cred. interval]
math5
math3 .6070097 .0323707 .000976 .6060445 .5440594 .6706959
_cons 30.3462 .1903067 .005658 30.34904 29.97555 30.71209
sigma2 32.17492 1.538155 .031688 32.0985 29.3045 35.38031
Note: Default priors are used for model parameters.

The results are similar between the two commands, but the interpretations are different. Recall that Bayesian models assume that all parameters are random. So a 95% credible interval of [0.54,0.67] for the slope coefficient on third-year math scores {math5:math3} implies that the probability that {math5:math3} is between 0.54 and 0.67 is 0.95. Such probabilistic interpretation is not appropriate for the confidence interval from regress, although the confidence interval does suggest a similar plausible range for {math5:math3}.

Also, for each parameter, regress produced just one estimate, whereas bayes: regress produced 10,000 MCMC estimates. MCMC estimates are simulated from a posterior distribution of model parameters, after discarding the first 2,500 estimates for the burn-in period. What is reported in the output are the summaries of the marginal posterior distributions of the parameters such as posterior means and posterior standard deviations.

The results will not always be similar between the two commands. They are similar in this example because the default priors the bayes prefix used are not informative for these data. The priors used are described in the model summary. They are normal with mean 0 and variance of 10,000 for the regression coefficients and inverse-gamma with shape and scale parameters of 0.01 for the error variance. Prior distributions are important for Bayesian models, and we will discuss them later in Custom priors.

We can use any feature of the bayesmh command, which fits general Bayesian models, with the bayes prefix. For example, we can display 90% highest posterior density (HPD) credible intervals instead of the default equal-tailed intervals on replay. We do this with bayes just like we do this with bayesmh:

. bayes, hpd clevel(90)

Model summary
Likelihood:
math5 ~ regress(xb_math5,{sigma2})
Priors:
{math5:math3 _cons} ~ normal(0,10000) (1)
{sigma2} ~ igamma(.01,.01)
(1) Parameters are elements of the linear form xb_math5.
Bayesian linear regression MCMC iterations = 12,500 Random-walk Metropolis—Hastings sampling Burn-in = 2,500 MCMC sample size = 10,000 Number of obs = 887 Acceptance rate = .3312 Efficiency: min = .1099 avg = .1529 Log marginal-likelihood = -2817.2335 max = .2356
HPD
Mean Std. dev. MCSE Median [90% cred. interval]
math5
math3 .6070097 .0323707 .000976 .6060445 .5580554 .6634722
_cons 30.3462 .1903067 .005658 30.34904 30.05275 30.66751
sigma2 32.17492 1.538155 .031688 32.0985 29.6363 34.66541
Note: Default priors are used for model parameters.

MCMC convergence and hypotheses testing

We can use any of Bayesian postestimation features (except bayespredict) after the bayes prefix.

For example, we can check MCMC convergence for {math5:math3} using bayesgraph diagnostics ([BAYES] bayesgraph).

. bayesgraph diagnostics {math5:math3}

Or we can test interval hypotheses using bayestest interval ([BAYES] bayestest interval):

. bayestest interval {math5:math3}, lower(0.5) 
     upper(0.6)

Interval tests     MCMC sample size =    10,000

       prob1 : 0.5 < {math5:math3} < 0.6

Mean Std. dev. MCSE
prob1 .42 0.49358 .0160684

The probability that the coefficient {math5:math3} lies between 0.5 and 0.6 is .42.

Also see Multiple chains and Gelman–Rubin convergence diagnostic for investigating convergence using multiple chains.

Gibbs sampling

The default sampling algorithm used by the bayes prefix with most of the estimation commands is an adaptive Metropolis–Hastings algorithm. For some prior and likelihood model combinations, an alternative Gibbs sampling algorithm may be available. Gibbs sampling is generally slower but more efficient than the Metropolis–Hastings sampling. By "more efficient", we mean that you will get more precise estimates from Gibbs sampling for the same number of iterations.

We can specify the gibbs option with bayes for linear regression to perform Gibbs sampling.

. bayes, gibbs: regress math5 math3

Burn-in ...
Simulation ...

Model summary
Likelihood:
math5 ~ normal(xb_math5,{sigma2})
Priors:
{math5:math3 _cons} ~ normal(0,10000) (1)
{sigma2} ~ igamma(.01,.01)
(1) Parameters are elements of the linear form xb_math5.
Bayesian linear regression MCMC iterations = 12,500 Gibbs sampling Burn-in = 2,500 MCMC sample size = 10,000 Number of obs = 887 Acceptance rate = 1 Efficiency: min = 1 avg = 1 Log marginal-likelihood = -2817.184 max = 1
Equal-tailed
Mean Std. dev. MCSE Median [95% cred. interval]
math5
math3 .6085104 .0333499 .000333 .6087819 .5426468 .6731657
_cons 30.34419 .1916673 .001917 30.34441 29.97587 30.72617
sigma2 32.16765 1.551119 .015511 32.10778 29.238 35.29901
Note: Default priors are used for model parameters.

Notice that the minimum and maximum efficiencies reported in the header are equal to 1 for Gibbs sampling and are much higher than those from the adaptive Metropolis–Hastings sampling. Higher efficiencies mean smaller Monte Carlo standard errors (MCSEs) and thus more precise posterior mean estimates. In our example, there is no practical difference between the two sampling procedures.

bayes's gibbs option specifies full Gibbs sampling in which all model parameters are sampled using the Gibbs method. Full Gibbs sampling is not available for all likelihood models and prior combinations. In fact, only bayes: regress and bayes: mvreg support this option. However, Gibbs sampling may be available for some model parameters, in which case you can use a hybrid sampling in which some parameters are sampled using the Gibbs method and the others using the Metropolis–Hastings method; see Gibbs and hybrid MH sampling in [BAYES] bayesmh for details.

Custom priors

As we mentioned earlier, a prior distribution is an important component of a Bayesian model and may have a large impact on your results. It summarizes your knowledge about a model parameter before you see the data. After the data are observed, the prior distribution is updated with the information from the observed data to form the posterior distribution of the parameter. The posterior distribution is then used for Bayesian inference.

Your science or previous studies will often suggest the appropriate prior distributions for your model parameters. In the absence of such knowledge, noninformative priors are recommended. The choice of a prior distribution becomes especially important with small datasets, which typically contain less information about the model parameters.

For convenience, the bayes prefix provides default priors, but you can also specify your own. The default priors are chosen to be fairly noninformative for moderately scaled model parameters but may become informative for large values of parameters.

The bayes prefix provides convenient options to modify the values of the default priors. For example, bayes: regress uses normal priors with mean 0 and variance of 10,000 for the regression coefficients and an inverse-gamma prior with shape and scale parameters of 0.01 for the error variance. We can change the default values by using the normalprior() and igammaprior() options.

. bayes, normalprior(10) igammaprior(1 2): regress math5 math3

Burn-in ...
Simulation ...

Model summary
Likelihood:
math5 ~ regress(xb_math5,{sigma2})
Priors:
{math5:math3 _cons} ~ normal(0,100) (1)
{sigma2} ~ igamma(1,2)
(1) Parameters are elements of the linear form xb_math5.
Bayesian linear regression MCMC iterations = 12,500 Random-walk Metropolis—Hastings sampling Burn-in = 2,500 MCMC sample size = 10,000 Number of obs = 887 Acceptance rate = .3503 Efficiency: min = .1189 avg = .1471 Log marginal-likelihood = -2815.3081 max = .2005
Equal-tailed
Mean Std. dev. MCSE Median [95% cred. interval]
math5
math3 .6076875 .033088 .000948 .6076282 .5405233 .673638
_cons 30.326 .1931568 .005602 30.32804 29.93212 30.70529
sigma2 32.09694 1.530839 .034185 32.03379 29.27687 35.37723
Note: Default priors are used for model parameters.

In the above, we used a standard deviation of 10 for the normal priors and the shape of 1 and scale of 2 for the inverse-gamma prior.

We can specify custom priors for some of the parameters and leave the priors for other parameters at their defaults. We can also use prior distributions other than the default. For example, below we use a uniform on (-50,50) prior only for the regression coefficients.

. bayes, prior({math5:}, uniform(-50,50)): regress math5 math3

Burn-in ...
Simulation ...

Model summary
Likelihood:
math5 ~ regress(xb_math5,{sigma2})
Priors:
{math5:math3 _cons} ~ uniform(-50,50) (1)
{sigma2} ~ igamma(.01,.01)
(1) Parameters are elements of the linear form xb_math5.
Bayesian linear regression MCMC iterations = 12,500 Random-walk Metropolis—Hastings sampling Burn-in = 2,500 MCMC sample size = 10,000 Number of obs = 887 Acceptance rate = .3492 Efficiency: min = .1222 avg = .1633 Log marginal-likelihood = -2815.3228 max = .241
Equal-tailed
Mean Std. dev. MCSE Median [95% cred. interval]
math5
math3 .6076356 .0328739 .00094 .6056774 .5433171 .6759952
_cons 30.33498 .1908214 .005361 30.33159 29.96478 30.70833
sigma2 32.15793 1.555689 .031691 32.08331 29.308 35.35561
Note: Default priors are used for model parameters.

{math5:} in the above prior() option is a shortcut for referring to all regression coefficients associated with the dependent variable math5: {math5:math3} and {math5:_cons}.

Or we can use custom priors for all model parameters.

. bayes, prior({math5:_cons}, uniform(-50,50)) prior({math5:math3}, 
     uniform(-5,5)) prior({sigma2}, jeffreys): regress math5 math3

Burn-in ...
Simulation ...

Model summary
Likelihood:
math5 ~ regress(xb_math5,{sigma2})
Priors:
{math5:_cons} ~ uniform(-50,50) (1)
{math5:math3} ~ uniform(-5,5) (1)
{sigma2} ~ jeffreys
(1) Parameters are elements of the linear form xb_math5.
Bayesian linear regression MCMC iterations = 12,500 Random-walk MetropolisHastings sampling Burn-in = 2,500 MCMC sample size = 10,000 Number of obs = 887 Acceptance rate = .3678 Efficiency: min = .1311 avg = .1617 Log marginal-likelihood = -2808.3156 max = .2184
Equal-tailed
Mean Std. dev. MCSE Median [95% cred. interval]
math5
math3 .6067153 .0333801 .000906 .606437 .5417588 .6723998
_cons 30.34962 .1935328 .005346 30.35526 29.97142 30.72829
sigma2 32.17297 1.546291 .033087 32.1054 29.34673 35.36544

Autoregressive models

AR(1) model

The bayes: regress specification is convenient, but we could already use bayesmh to fit a linear regression. What we cannot do when using bayesmh, for example, is fit an autoregressive model. We can use bayes: regress to do that.

Consider quarterly data on coal consumption in the United Kingdom from 1960 to 1986 (for example, Congdon [2003] and Harvey [1989]). The outcome of interest is coal consumption (in millions of tons) in a given year and quarter. Variable lcoal is transformed using log(coal/1000).

Let's fit a Bayesian autoregressive model with one lag, AR(1), to these data.

. bayes: regress lcoal L.lcoal

Burn-in ...
Simulation ...

Model summary
Likelihood:
lcoal ~ regress(xb_lcoal,{sigma2})
Priors:
{lcoal:L.lcoal _cons} ~ normal(0,10000) (1)
{sigma2} ~ igamma(.01,.01)
(1) Parameters are elements of the linear form xb_lcoal.
Bayesian linear regression MCMC iterations = 12,500 Random-walk Metropolis—Hastings sampling Burn-in = 2,500 MCMC sample size = 10,000 Number of obs = 107 Acceptance rate = .3285 Efficiency: min = .1199 avg = .1448 Log marginal-likelihood = -75.889709 max = .1905
Equal-tailed
Mean Std. dev. MCSE Median [95% cred. interval]
lcoal
lcoal
L1. .7143121 .0649968 .001877 .7123857 .5884089 .8436602
_cons -.6896604 .1561023 .004433 -.6935272 -.9970502 -.3879924
sigma2 .1702592 .0243144 .000557 .1672834 .1299619 .2248287
Note: Default priors are used for model parameters.

We used Stata's time-series lag operator L. to include the first lag of our dependent variable lcoal in the regression model.

We save MCMC estimates and store estimation results from our Bayesian AR(1) model for later comparison with other AR models.

. bayes, saving(lag1_mcmc)

. estimates store lag1

The stationarity assumption of an AR(1) model requires that the first lag coefficient, {lcoal:L.lcoal}, is between -1 and 1. We can use a prior distribution to incorporate this assumption in our Bayesian model. For example, we can specify a uniform on (-1,1) prior for {lcoal:L.lcoal}.

. bayes, prior({lcoal:L.lcoal}, uniform(-1,1)): regress lcoal L.lcoal

Burn-in ...
Simulation ...

Model summary
Likelihood:
lcoal ~ regress(xb_lcoal,{sigma2})
Priors:
{lcoal:L.lcoal} ~ uniform(-1,1) (1)
{lcoal:_cons} ~ normal(0,10000) (1)
{sigma2} ~ igamma(.01,.01)
(1) Parameters are elements of the linear form xb_lcoal.
Bayesian linear regression MCMC iterations = 12,500 Random-walk Metropolis—Hastings sampling Burn-in = 2,500 MCMC sample size = 10,000 Number of obs = 107 Acceptance rate = .4379 Efficiency: min = .006311 avg = .06198 Log marginal-likelihood = -70.952743 max = .1731
Equal-tailed
Mean Std. dev. MCSE Median [95% cred. interval]
lcoal
lcoal
L1. .7111236 .0716918 .008876 .7085994 .5811546 .8647907
_cons -.6964178 .1747408 .021995 -.7047339 -1.013976 -.3139992
sigma2 .1709232 .0240107 .000577 .1695033 .1296427 .2222473
Note: Default priors are used for some model parameters.

The specified uniform prior has little impact on our results.

Model comparison of AR(p) models

In the previous section, we fit an AR(1) model to the coal consumption data. Is the AR(1) model sufficient for these data? Do we need a higher-order model such as AR(2) or AR(3)? We can use Bayesian model selection to answer this question.

We will use bayestest model ([BAYES] bayestest model) to compare different AR models using model posterior probabilities. To use bayestest model, we need to fit each model of interest separately and store its estimation results. For brevity, we omit the outputs from the fitted models.

We fit the AR(2) model by including the second lag L2.lcoal of the dependent variable in the regression model. We use bayes's saving() option during estimation to save MCMC estimates in the Stata dataset lag2_mcmc.dta. This dataset is part of Bayesian estimation results, and it must be saved before estimates store can be used. We then use estimates store to store the standard estimation results.

. bayes, saving(lag2_mcmc): regress lcoal L.lcoal L2.lcoal

. estimates store lag2

We repeat the above steps for the AR(3), AR(4), and AR(5) models.

. bayes, saving(lag3_mcmc): regress lcoal L(1/3).lcoal

. estimates store lag3

. bayes, saving(lag4_mcmc): regress lcoal L(1/4).lcoal

. estimates store lag4

. bayes, saving(lag5_mcmc): regress lcoal L(1/5).lcoal

. estimates store lag5

We then use bayestest model to compare the models.

. bayestest model lag1 lag2 lag3 lag4 lag5

Bayesian model tests

log(ML) P(M) P(M|y)
lag1 -75.8897 0.2000 0.0000
lag2 -82.5078 0.2000 0.0000
lag3 -59.6688 0.2000 0.0000
lag4 -13.8944 0.2000 0.9990
lag5 -20.8194 0.2000 0.0010
Note: Marginal likelihood (ML) is computed using Laplace-Metropolis approximation.

Given equal prior probabilities for all five AR models, the AR(4) model has the highest posterior probability of 0.9990.

Given that our data are quarterly, it is not surprising that the fourth lag is so important. It is comforting that our data confirm that importance.

Choosing autocorrelation lag automatically

Editors note: This section uses uniquely Bayesian thinking to address the problem of model selection. It is a wonderful solution, but be forewarned that several frequentists at StataCorp are still scratching their heads. You can skip the section if you wish.

In the previous section, we used model posterior probabilities to choose the autocorrelation lag. Instead, we can extend our Bayesian autoregressive model to estimate the lag. Let the autocorrelation lag be another parameter in the model. Suppose that it can take any integer value between 1 and 5. Make prior distributions of the lag coefficients depend on the lag. If the estimated lag parameter is greater or equal to p, assign the pth lag coefficient a normal prior with mean 0 and a variance of 100. Otherwise, assign it a normal prior with a mean 0 and a variance of 0.01. In other words, if the estimated lag parameter is equal to p, we shrink all of the lag coefficients corresponding to lags greater than p toward zero.

The arguments of the prior distributions in the prior() option can be specified as expressions. We use this feature to accommodate the above priors for the lag coefficients,

. bayes, prior({lcoal:L1.lcoal}, normal(0, cond({lag}>=1,100,0.01))) ...

where {lag} is the autocorrelation lag parameter.

Our final model specification is as follows, in which {lag} is assigned a discrete prior with equal probabilities for each of the possible values 1, 2, 3, 4, and 5.

. bayes,  prior({lcoal:L1.lcoal}, normal(0, cond({lag}>=1,100,0.01)))
     prior({lcoal:L2.lcoal}, normal(0, cond({lag}>=2,100,0.01)))
     prior({lcoal:L3.lcoal}, normal(0, cond({lag}>=3,100,0.01)))
     prior({lcoal:L4.lcoal}, normal(0, cond({lag}>=4,100,0.01)))
     prior({lcoal:L5.lcoal}, normal(0, cond({lag}>=5,100,0.01)))
     prior({lag}, index(0.2,0.2,0.2,0.2,0.2)):
     regress lcoal L(1/5).lcoal

note: operator L1. is replaced with L. in parameter name L1.lcoal.
  
Burn-in ...
note: invalid initial state.
Simulation ...

Model summary
Likelihood:
lcoal ~ regress(xb_lcoal,{sigma2})
Priors:
{lcoal:L(1 2 3 4 5).lcoal} ~ normal(0,cond({lag}>=1,100,0.01)) (1)
{lcoal:_cons} ~ normal(0,10000) (1)
{sigma2} ~ igamma(.01,.01)
Hyperprior
{lag} ~ index(0.2,0.2,0.2,0.2,0.2)
(1) Parameters are elements of the linear form xb_lcoal.
Bayesian linear regression MCMC iterations = 12,500 Random-walk Metropolis—Hastings sampling Burn-in = 2,500 MCMC sample size = 10,000 Number of obs = 103 Acceptance rate = .34 Efficiency: min = .002852 avg = .04431 Log marginal-likelihood = -8.2084752 max = .1716
Equal-tailed
Mean Std. dev. MCSE Median [95% cred. interval]
lcoal
lcoal
L1. .2062446 .0784492 .011311 .2050062 .0487352 .3605725
L2. -.0738366 .0588681 .002764 -.0739381 -.1877364 .0391768
L3. .100462 .0597828 .004398 .1003963 -.0142032 .2216838
L4. .7994076 .0606384 .006607 .8031808 .6651497 .910174
L5. -.0729926 .0698683 .009211 -.0708155 -.2074388 .060126
_cons -.1401982 .0812334 .015212 -.1438271 -.2877263 .0403175
sigma2 .0343128 .0051157 .000123 .0338508 .0256253 .0456132
lag 4.0194 .1379331 .004424 4 4 4
Note: Default priors are used for some model parameters. Note: There is a high autocorrelation after 500 lags.

The estimated lag is 4, which is consistent with our findings based on posterior probabilities of the considered AR models in the previous section.

Logistic regression with perfect predictors

Bayesian regression models can be useful in the presence of perfect predictors. Suppose that we want to model the binary outcome disease, the presence of a heart disease, as a function of a number of covariates: age, gender, indicator for fasting blood sugar greater than 120 mg/dl (variable isfbs), and categories for resting electrocardiographic results (variable restecg). This example is described in detail in Logistic regression with perfect predictors of [BAYES] bayes. The data are a sample from Switzerland.

Let's fit the standard logistic model first.

. logit disease restecg isfbs age male

note: restecg != 0 predicts success perfectly;
      restecg omitted and 17 obs not used.

note: isfbs != 0 predicts success perfectly;
      isfbs omitted and 3 obs not used.

note: male != 1 predicts success perfectly;
      male omitted and 2 obs not used.

Iteration 0:  Log likelihood = -4.2386144  
Iteration 1:  Log likelihood = -4.2358116  
Iteration 2:  Log likelihood = -4.2358076  
Iteration 3:  Log likelihood = -4.2358076  

Logistic regression                                     Number of obs =     26
                                                        LR chi2(1)    =   0.01
                                                        Prob > chi2   = 0.9403
Log likelihood = -4.2358076                             Pseudo R2     = 0.0007

disease Coefficient Std. err. z P>|z| [95% conf. interval]
restecg 0 (omitted)
isfbs 0 (omitted)
age -.0097846 .1313502 -0.07 0.941 -.2672263 .2476572
male 0 (omitted)
_cons 3.763893 7.423076 0.51 0.612 -10.78507 18.31285

Variables restecg, isfbs, and male are omitted from the model because they predict the success probability perfectly for these data. In other words, the corresponding coefficients cannot be identified based on the observed data. The problem of perfect predictors often arises in binary-outcome models, especially with small datasets as in our example.

Sometimes, we may have reliable external information about the parameters of interest that can help identify them when the data alone cannot. For example, a similar heart study was conducted in Hungary. The estimates of the parameters from that study may be used to form the informative prior distributions for the parameters of the Swiss study.

We may consider a Bayesian logistic regression with the following informative priors.

. bayes, prior({disease:restecg age}, normal(0,10))
     prior({disease:isfbs male}, normal(1,10))
     prior({disease:_cons}, normal(-4,10)) nomleinitial:
     logit disease restecg isfbs age male, asis

Burn-in ...
Simulation ...

Model summary
Likelihood:
disease ~ logit(xb_disease)
Priors:
{disease:restecg age} ~ normal(0,10) (1)
{disease:isfbs male} ~ normal(1,10) (1)
{disease:_cons} ~ normal(-4,10) (1)
(1) Parameters are elements of the linear form xb_disease.
Bayesian logistic regression MCMC iterations = 12,500 Random-walk Metropolis—Hastings sampling Burn-in = 2,500 MCMC sample size = 10,000 Number of obs = 48 Acceptance rate = .2121 Efficiency: min = .01885 avg = .04328 Log marginal-likelihood = -11.006071 max = .06184
Equal-tailed
disease Mean Std. dev. MCSE Median [95% cred. interval]
restecg 1.965122 2.315475 .115615 1.655961 -2.029873 6.789415
isfbs 1.708631 2.726071 .113734 1.607439 -3.306837 7.334592
age .1258811 .0707431 .003621 .1245266 -.0016807 .2719748
male .2671381 2.237349 .162967 .3318061 -4.106425 4.609955
_cons -2.441911 2.750613 .110611 -2.538183 -7.596747 3.185172

Whenever informative priors are used, the sensitivity of the results to the choice of the priors must be explored. For example, it will be useful to consider other heart disease studies to verify the reasonableness of the prior specifications and to use other priors to check the sensitivity of the estimated parameters to the choice of priors.

GLM

Just like with other models, to fit Bayesian generalized linear models, we can simply prefix Stata's glm command with bayes:.

Example 2 in [R] glm analyzes data from an insecticide experiment. The effect of the dose of insecticide on the number of flour beetles killed is of interest. Variable ldose records the log dose of insecticide; n, the number of flour beetles subjected to each dose; and r, the number killed. Two models are considered: binomial regression with a logit link and with a complementary log-log link. Below we fit Bayesian analogs of these models and compare them using the deviance information criterion (DIC).

We fit Bayesian binomial regression with a logit link by typing

. bayes: glm r ldose, family(binomial n)

Burn-in ...
Simulation ...

Model summary
Likelihood:
r ~ glm(xb_r)
Prior:
{r:ldose _cons} ~ normal(0,10000) (1)
(1) Parameters are elements of the linear form xb_r.
Bayesian generalized linear models MCMC iterations = 12,500 Random-walk Metropolis—Hastings sampling Burn-in = 2,500 MCMC sample size = 10,000 Family: binomial n Number of obs = 8 Link: logit Scale parameter = 1 Acceptance rate = .1777 Efficiency: min = .09582 avg = .09593 Log marginal-likelihood = -29.167913 max = .09604
Equal-tailed
r Mean Std. dev. MCSE Median [95% cred. interval]
ldose 34.48517 2.816542 .090987 34.43137 29.12495 40.06385
_cons -61.09645 5.005658 .161527 -61.0278 -71.01371 -51.59132
Note: Default priors are used for model parameters.

We save the MCMC results and store the estimation results for later model comparison.

. bayes, saving(logit_mcmc)

. estimates store logit

We fit Bayesian binomial regression with a complementary log-log link as follows. This time, we save the MCMC results during estimation.

. bayes, saving(cloglog_mcmc): glm r ldose, family(binomial n) link(cloglog)

Burn-in ...
Simulation ...

file cloglog_mcmc.dta saved.

Model summary
Likelihood:
r ~ glm(xb_r)
Priors:
{r:ldose _cons} ~ normal(0,10000) (1)
(1) Parameters are elements of the linear form xb_r.
Bayesian generalized linear models MCMC iterations = 12,500 Random-walk Metropolis—Hastings sampling Burn-in = 2,500 MCMC sample size = 10,000 Family: binomial n Number of obs = 8 Link: complementary log-log Scale parameter = 1 Acceptance rate = .2069 Efficiency: min = .1264 avg = .1265 Log marginal-likelihood = -26.111739 max = .1265
Equal-tailed
r Mean Std. dev. MCSE Median [95% cred. interval]
ldose 22.17523 1.750735 .049226 22.15812 18.93784 25.68367
_cons -39.81636 3.152542 .088657 -39.78953 -46.13103 -33.98554
Note: Default priors are used for model parameters. . estimates store cloglog

We now use bayesstats ic ([BAYES] bayesstats ic) to compare the models.

. bayesstats ic logit cloglog

Bayesian information criteria

DIC log(ML) log(BF)
logit 41.26855 -29.16791 .
cloglog 33.51447 -26.11174 3.056174
Note: Marginal likelihood (ML) is computed using Laplace-Metropolis approximation.

The complementary log-log model is preferable because its DIC value is smaller.

References

Congdon, P. 2003. Applied Bayesian Modeling. New York: John Wiley & Sons.

Harvey, A. C. 1989. Forecasting, Structural Time Series Models, and the Kalman Filter. Cambridge: Cambridge University Press.

Mortimore, P., P. Sammons, L. Stoll, D. Lewis, and R. Ecob. 1988. School Matters: The Junior Years. Wells, Somerset, UK: Open Books.

Tell me more

Learn more about Stata's Bayesian analysis features.

Learn more about Bayesian multilevel models, Bayesian panel-data models, Bayesian survival models, and Bayesian sample-selection models.

Read more about the bayes prefix and Bayesian analysis in the Bayesian Analysis Reference Manual.