 »  Home »  Products »  Features »  bayes: prefix

## Bayesian regression models using the bayes prefix

Fitting Bayesian regression models can be just as intuitive as performing Bayesian inference—introducing the bayes prefix in Stata. The bayes prefix combines Bayesian features with Stata's intuitive and elegant specification of regression models. It lets you fit Bayesian regression models more easily and fit more models.

You fit linear regression by using

. regress y x1 x2


You can fit Bayesian linear regression by simply using

. bayes: regress y x1 x2


You can also fit a Bayesian survival model by simply using

. bayes: streg x1 x2, distribution(weibull)


You can use the bayes prefix with many more regression models, including logistic, ordered probit, multinomial logistic, Poisson, generalized linear, conditional logistic, zero-inflated, sample-selection, and more. See the full list of supported estimation commands. Multilevel models are among the supported models too! See Bayesian multilevel models for details.

All of Stata's Bayesian features are supported by the bayes prefix. You can select from many prior distributions for model parameters or use default priors or even define your own priors. You can use the default adaptive Metropolis–Hastings sampling, or Gibbs sampling, or a combination of the two sampling methods. And you can use any other feature included in the bayesmh command ([BAYES] bayesmh).

After estimation, you can use Stata's standard Bayesian postestimation tools such as

### Let's see it work

Let's now see the bayes prefix in action.

### Estimation

Consider data on math scores of pupils in the third and fifth years from different schools in Inner London (Mortimore et al. 1988). We want to fit a linear regression of five-year math scores (math5) on three-year math scores (math3). Here we ignore potential school effects; see Bayesian two-level models for analyses that account for them.

We fit a linear regression by typing

. regress math5 math3

Source         SS           df       MS      Number of obs   =       887
F(1, 885)       =    341.40
Model    10960.2737         1  10960.2737   Prob > F        =    0.0000
Residual    28411.6181       885  32.1035233   R-squared       =    0.2784
Total    39371.8918       886  44.4378011   Root MSE        =     5.666

math5   Coefficient  Std. err.      t    P>|t|     [95% conf. interval]

math3     .6081306   .0329126    18.48   0.000     .5435347    .6727265
_cons     30.34656   .1906157   159.20   0.000     29.97245    30.72067



To fit a Bayesian linear regression, we simply prefix the above regress command with bayes:.

. bayes: regress math5 math3

Burn-in ...
Simulation ...

Model summary

Likelihood:
math5 ~ regress(xb_math5,{sigma2})

Priors:
{math5:math3 _cons} ~ normal(0,10000)                                    (1)
{sigma2} ~ igamma(.01,.01)

(1) Parameters are elements of the linear form xb_math5.
Bayesian linear regression                       MCMC iterations  =     12,500
Random-walk Metropolis—Hastings sampling         Burn-in          =      2,500
MCMC sample size =     10,000
Number of obs    =        887
Acceptance rate  =      .3312
Efficiency:  min =      .1099
avg =      .1529
Log marginal-likelihood = -2817.2335                          max =      .2356

Equal-tailed
Mean   Std. dev.     MCSE     Median  [95% cred. interval]

math5
math3    .6070097   .0323707   .000976   .6060445   .5440594   .6706959
_cons     30.3462   .1903067   .005658   30.34904   29.97555   30.71209

sigma2    32.17492   1.538155   .031688    32.0985    29.3045   35.38031
Note: Default priors are used for model parameters.

The results are similar between the two commands, but the interpretations are different. Recall that Bayesian models assume that all parameters are random. So a 95% credible interval of [0.54,0.67] for the slope coefficient on third-year math scores {math5:math3} implies that the probability that {math5:math3} is between 0.54 and 0.67 is 0.95. Such probabilistic interpretation is not appropriate for the confidence interval from regress, although the confidence interval does suggest a similar plausible range for {math5:math3}.

Also, for each parameter, regress produced just one estimate, whereas bayes: regress produced 10,000 MCMC estimates. MCMC estimates are simulated from a posterior distribution of model parameters, after discarding the first 2,500 estimates for the burn-in period. What is reported in the output are the summaries of the marginal posterior distributions of the parameters such as posterior means and posterior standard deviations.

The results will not always be similar between the two commands. They are similar in this example because the default priors the bayes prefix used are not informative for these data. The priors used are described in the model summary. They are normal with mean 0 and variance of 10,000 for the regression coefficients and inverse-gamma with shape and scale parameters of 0.01 for the error variance. Prior distributions are important for Bayesian models, and we will discuss them later in Custom priors.

We can use any feature of the bayesmh command, which fits general Bayesian models, with the bayes prefix. For example, we can display 90% highest posterior density (HPD) credible intervals instead of the default equal-tailed intervals on replay. We do this with bayes just like we do this with bayesmh:

. bayes, hpd clevel(90)

Model summary

Likelihood:
math5 ~ regress(xb_math5,{sigma2})

Priors:
{math5:math3 _cons} ~ normal(0,10000)                                    (1)
{sigma2} ~ igamma(.01,.01)

(1) Parameters are elements of the linear form xb_math5.
Bayesian linear regression                       MCMC iterations  =     12,500
Random-walk Metropolis—Hastings sampling         Burn-in          =      2,500
MCMC sample size =     10,000
Number of obs    =        887
Acceptance rate  =      .3312
Efficiency:  min =      .1099
avg =      .1529
Log marginal-likelihood = -2817.2335                          max =      .2356

HPD
Mean   Std. dev.     MCSE     Median  [90% cred. interval]

math5
math3    .6070097   .0323707   .000976   .6060445   .5580554   .6634722
_cons     30.3462   .1903067   .005658   30.34904   30.05275   30.66751

sigma2    32.17492   1.538155   .031688    32.0985    29.6363   34.66541
Note: Default priors are used for model parameters.

### MCMC convergence and hypotheses testing

We can use any of Bayesian postestimation features (except bayespredict) after the bayes prefix.

For example, we can check MCMC convergence for {math5:math3} using bayesgraph diagnostics ([BAYES] bayesgraph).

. bayesgraph diagnostics {math5:math3} Or we can test interval hypotheses using bayestest interval ([BAYES] bayestest interval):

. bayestest interval {math5:math3},
> lower(0.5) upper(0.6)

Interval tests     MCMC sample size =    10,000

prob1 : 0.5 < {math5:math3} < 0.6

Mean    Std. dev.      MCSE

prob1         .42     0.49358   .0160684



The probability that the coefficient {math5:math3} lies between 0.5 and 0.6 is .42.

Also see Multiple chains and Gelman-Rubin convergence diagnostic for investigating convergence using multiple chains.

### Gibbs sampling

The default sampling algorithm used by the bayes prefix with most of the estimation commands is an adaptive Metropolis–Hastings algorithm. For some prior and likelihood model combinations, an alternative Gibbs sampling algorithm may be available. Gibbs sampling is generally slower but more efficient than the Metropolis–Hastings sampling. By "more efficient", we mean that you will get more precise estimates from Gibbs sampling for the same number of iterations.

We can specify the gibbs option with bayes for linear regression to perform Gibbs sampling.

. bayes, gibbs: regress math5 math3

Burn-in ...
Simulation ...

Model summary

Likelihood:
math5 ~ normal(xb_math5,{sigma2})

Priors:
{math5:math3 _cons} ~ normal(0,10000)                                    (1)
{sigma2} ~ igamma(.01,.01)

(1) Parameters are elements of the linear form xb_math5.
Bayesian linear regression                       MCMC iterations  =     12,500
Gibbs sampling                                   Burn-in          =      2,500
MCMC sample size =     10,000
Number of obs    =        887
Acceptance rate  =          1
Efficiency:  min =          1
avg =          1
Log marginal-likelihood =  -2817.184                          max =          1

Equal-tailed
Mean   Std. dev.     MCSE     Median  [95% cred. interval]

math5
math3    .6085104   .0333499   .000333   .6087819   .5426468   .6731657
_cons    30.34419   .1916673   .001917   30.34441   29.97587   30.72617

sigma2    32.16765   1.551119   .015511   32.10778     29.238   35.29901
Note: Default priors are used for model parameters.

Notice that the minimum and maximum efficiencies reported in the header are equal to 1 for Gibbs sampling and are much higher than those from the adaptive Metropolis–Hastings sampling. Higher efficiencies mean smaller Monte Carlo standard errors (MCSEs) and thus more precise posterior mean estimates. In our example, there is no practical difference between the two sampling procedures.

bayes's gibbs option specifies full Gibbs sampling in which all model parameters are sampled using the Gibbs method. Full Gibbs sampling is not available for all likelihood models and prior combinations. In fact, only bayes: regress and bayes: mvreg support this option. However, Gibbs sampling may be available for some model parameters, in which case you can use a hybrid sampling in which some parameters are sampled using the Gibbs method and the others using the Metropolis–Hastings method; see Gibbs and hybrid MH sampling in [BAYES] bayesmh for details.

### Custom priors

As we mentioned earlier, a prior distribution is an important component of a Bayesian model and may have a large impact on your results. It summarizes your knowledge about a model parameter before you see the data. After the data are observed, the prior distribution is updated with the information from the observed data to form the posterior distribution of the parameter. The posterior distribution is then used for Bayesian inference.

Your science or previous studies will often suggest the appropriate prior distributions for your model parameters. In the absence of such knowledge, noninformative priors are recommended. The choice of a prior distribution becomes especially important with small datasets, which typically contain less information about the model parameters.

For convenience, the bayes prefix provides default priors, but you can also specify your own. The default priors are chosen to be fairly noninformative for moderately scaled model parameters but may become informative for large values of parameters.

The bayes prefix provides convenient options to modify the values of the default priors. For example, bayes: regress uses normal priors with mean 0 and variance of 10,000 for the regression coefficients and an inverse-gamma prior with shape and scale parameters of 0.01 for the error variance. We can change the default values by using the normalprior() and igammaprior() options.

. bayes, normalprior(10) igammaprior(1 2): regress math5 math3

Burn-in ...
Simulation ...

Model summary

Likelihood:
math5 ~ regress(xb_math5,{sigma2})

Priors:
{math5:math3 _cons} ~ normal(0,100)                                      (1)
{sigma2} ~ igamma(1,2)

(1) Parameters are elements of the linear form xb_math5.
Bayesian linear regression                       MCMC iterations  =     12,500
Random-walk Metropolis—Hastings sampling         Burn-in          =      2,500
MCMC sample size =     10,000
Number of obs    =        887
Acceptance rate  =      .3503
Efficiency:  min =      .1189
avg =      .1471
Log marginal-likelihood = -2815.3081                          max =      .2005

Equal-tailed
Mean   Std. dev.     MCSE     Median  [95% cred. interval]

math5
math3    .6076875    .033088   .000948   .6076282   .5405233    .673638
_cons      30.326   .1931568   .005602   30.32804   29.93212   30.70529

sigma2    32.09694   1.530839   .034185   32.03379   29.27687   35.37723
Note: Default priors are used for model parameters.

In the above, we used a standard deviation of 10 for the normal priors and the shape of 1 and scale of 2 for the inverse-gamma prior.

We can specify custom priors for some of the parameters and leave the priors for other parameters at their defaults. We can also use prior distributions other than the default. For example, below we use a uniform on (-50,50) prior only for the regression coefficients.

. bayes, prior({math5:}, uniform(-50,50)): regress math5 math3

Burn-in ...
Simulation ...

Model summary

Likelihood:
math5 ~ regress(xb_math5,{sigma2})

Priors:
{math5:math3 _cons} ~ uniform(-50,50)                                    (1)
{sigma2} ~ igamma(.01,.01)

(1) Parameters are elements of the linear form xb_math5.
Bayesian linear regression                       MCMC iterations  =     12,500
Random-walk Metropolis—Hastings sampling         Burn-in          =      2,500
MCMC sample size =     10,000
Number of obs    =        887
Acceptance rate  =      .3492
Efficiency:  min =      .1222
avg =      .1633
Log marginal-likelihood = -2815.3228                          max =       .241

Equal-tailed
Mean   Std. dev.     MCSE     Median  [95% cred. interval]

math5
math3    .6076356   .0328739    .00094   .6056774   .5433171   .6759952
_cons    30.33498   .1908214   .005361   30.33159   29.96478   30.70833

sigma2    32.15793   1.555689   .031691   32.08331     29.308   35.35561
Note: Default priors are used for model parameters.

{math5:} in the above prior() option is a shortcut for referring to all regression coefficients associated with the dependent variable math5: {math5:math3} and {math5:_cons}.

Or we can use custom priors for all model parameters.

. bayes, prior({math5:_cons}, uniform(-50,50))
>        prior({math5:math3}, uniform(-5,5))
>        prior({sigma2}, jeffreys): regress math5 math3

Burn-in ...
Simulation ...

Model summary

Likelihood:
math5 ~ regress(xb_math5,{sigma2})

Priors:
{math5:_cons} ~ uniform(-50,50)                                          (1)
{math5:math3} ~ uniform(-5,5)                                            (1)
{sigma2} ~ jeffreys

(1) Parameters are elements of the linear form xb_math5.
Bayesian linear regression                       MCMC iterations  =     12,500
Random-walk MetropolisHastings sampling         Burn-in          =      2,500—
MCMC sample size =     10,000
Number of obs    =        887
Acceptance rate  =      .3678
Efficiency:  min =      .1311
avg =      .1617
Log marginal-likelihood = -2808.3156                          max =      .2184

Equal-tailed
Mean   Std. dev.     MCSE     Median  [95% cred. interval]

math5
math3    .6067153   .0333801   .000906    .606437   .5417588   .6723998
_cons    30.34962   .1935328   .005346   30.35526   29.97142   30.72829

sigma2    32.17297   1.546291   .033087    32.1054   29.34673   35.36544


### AR(1) model

The bayes: regress specification is convenient, but we could already use bayesmh to fit a linear regression. What we cannot do when using bayesmh, for example, is fit an autoregressive model. We can use bayes: regress to do that.

Consider quarterly data on coal consumption in the United Kingdom from 1960 to 1986 (for example, Congdon  and Harvey ). The outcome of interest is coal consumption (in millions of tons) in a given year and quarter. Variable lcoal is transformed using log(coal/1000).

Let's fit a Bayesian autoregressive model with one lag, AR(1), to these data.

. bayes: regress lcoal L.lcoal

Burn-in ...
Simulation ...

Model summary

Likelihood:
lcoal ~ regress(xb_lcoal,{sigma2})

Priors:
{lcoal:L.lcoal _cons} ~ normal(0,10000)                                  (1)
{sigma2} ~ igamma(.01,.01)

(1) Parameters are elements of the linear form xb_lcoal.
Bayesian linear regression                       MCMC iterations  =     12,500
Random-walk Metropolis—Hastings sampling         Burn-in          =      2,500
MCMC sample size =     10,000
Number of obs    =        107
Acceptance rate  =      .3285
Efficiency:  min =      .1199
avg =      .1448
Log marginal-likelihood = -75.889709                          max =      .1905

Equal-tailed
Mean   Std. dev.     MCSE     Median  [95% cred. interval]

lcoal
lcoal
L1.    .7143121   .0649968   .001877   .7123857   .5884089   .8436602

_cons   -.6896604   .1561023   .004433  -.6935272  -.9970502  -.3879924

sigma2    .1702592   .0243144   .000557   .1672834   .1299619   .2248287
Note: Default priors are used for model parameters.

We used Stata's time-series lag operator L. to include the first lag of our dependent variable lcoal in the regression model.

We save MCMC estimates and store estimation results from our Bayesian AR(1) model for later comparison with other AR models.

. bayes, saving(lag1_mcmc)

. estimates store lag1


The stationarity assumption of an AR(1) model requires that the first lag coefficient, {lcoal:L.lcoal}, is between -1 and 1. We can use a prior distribution to incorporate this assumption in our Bayesian model. For example, we can specify a uniform on (-1,1) prior for {lcoal:L.lcoal}.

. bayes, prior({lcoal:L.lcoal}, uniform(-1,1)): regress lcoal L.lcoal

Burn-in ...
Simulation ...

Model summary

Likelihood:
lcoal ~ regress(xb_lcoal,{sigma2})

Priors:
{lcoal:L.lcoal} ~ uniform(-1,1)                                          (1)
{lcoal:_cons} ~ normal(0,10000)                                        (1)
{sigma2} ~ igamma(.01,.01)

(1) Parameters are elements of the linear form xb_lcoal.
Bayesian linear regression                       MCMC iterations  =     12,500
Random-walk Metropolis—Hastings sampling         Burn-in          =      2,500
MCMC sample size =     10,000
Number of obs    =        107
Acceptance rate  =      .4379
Efficiency:  min =    .006311
avg =     .06198
Log marginal-likelihood = -70.952743                          max =      .1731

Equal-tailed
Mean   Std. dev.     MCSE     Median  [95% cred. interval]

lcoal
lcoal
L1.    .7111236   .0716918   .008876   .7085994   .5811546   .8647907

_cons   -.6964178   .1747408   .021995  -.7047339  -1.013976  -.3139992

sigma2    .1709232   .0240107   .000577   .1695033   .1296427   .2222473
Note: Default priors are used for some model parameters.

The specified uniform prior has little impact on our results.

### Model comparison of AR(p) models

In the previous section, we fit an AR(1) model to the coal consumption data. Is the AR(1) model sufficient for these data? Do we need a higher-order model such as AR(2) or AR(3)? We can use Bayesian model selection to answer this question.

We will use bayestest model ([BAYES] bayestest model) to compare different AR models using model posterior probabilities. To use bayestest model, we need to fit each model of interest separately and store its estimation results. For brevity, we omit the outputs from the fitted models.

We fit the AR(2) model by including the second lag L2.lcoal of the dependent variable in the regression model. We use bayes's saving() option during estimation to save MCMC estimates in the Stata dataset lag2_mcmc.dta. This dataset is part of Bayesian estimation results, and it must be saved before estimates store can be used. We then use estimates store to store the standard estimation results.

. bayes, saving(lag2_mcmc): regress lcoal L.lcoal L2.lcoal

. estimates store lag2


We repeat the above steps for the AR(3), AR(4), and AR(5) models.

. bayes, saving(lag3_mcmc): regress lcoal L(1/3).lcoal

. estimates store lag3

. bayes, saving(lag4_mcmc): regress lcoal L(1/4).lcoal

. estimates store lag4

. bayes, saving(lag5_mcmc): regress lcoal L(1/5).lcoal

. estimates store lag5


We then use bayestest model to compare the models.

. bayestest model lag1 lag2 lag3 lag4 lag5

Bayesian model tests

log(ML)       P(M)     P(M|y)

lag1    -75.8897     0.2000     0.0000
lag2    -82.5078     0.2000     0.0000
lag3    -59.6688     0.2000     0.0000
lag4    -13.8944     0.2000     0.9990
lag5    -20.8194     0.2000     0.0010
Note: Marginal likelihood (ML) is computed using
Laplace-Metropolis approximation.

Given equal prior probabilities for all five AR models, the AR(4) model has the highest posterior probability of 0.9990.

Given that our data are quarterly, it is not surprising that the fourth lag is so important. It is comforting that our data confirm that importance.

### Choosing autocorrelation lag automatically

Editors note: This section uses uniquely Bayesian thinking to address the problem of model selection. It is a wonderful solution, but be forewarned that several frequentists at StataCorp are still scratching their heads. You can skip the section if you wish.

In the previous section, we used model posterior probabilities to choose the autocorrelation lag. Instead, we can extend our Bayesian autoregressive model to estimate the lag. Let the autocorrelation lag be another parameter in the model. Suppose that it can take any integer value between 1 and 5. Make prior distributions of the lag coefficients depend on the lag. If the estimated lag parameter is greater or equal to p, assign the pth lag coefficient a normal prior with mean 0 and a variance of 100. Otherwise, assign it a normal prior with a mean 0 and a variance of 0.01. In other words, if the estimated lag parameter is equal to p, we shrink all of the lag coefficients corresponding to lags greater than p toward zero.

The arguments of the prior distributions in the prior() option can be specified as expressions. We use this feature to accommodate the above priors for the lag coefficients,

. bayes, prior({lcoal:L1.lcoal}, normal(0, cond({lag}>=1,100,0.01))) ...


where {lag} is the autocorrelation lag parameter.

Our final model specification is as follows, in which {lag} is assigned a discrete prior with equal probabilities for each of the possible values 1, 2, 3, 4, and 5.

. bayes,  prior({lcoal:L1.lcoal}, normal(0, cond({lag}>=1,100,0.01)))
>         prior({lcoal:L2.lcoal}, normal(0, cond({lag}>=2,100,0.01)))
>         prior({lcoal:L3.lcoal}, normal(0, cond({lag}>=3,100,0.01)))
>         prior({lcoal:L4.lcoal}, normal(0, cond({lag}>=4,100,0.01)))
>         prior({lcoal:L5.lcoal}, normal(0, cond({lag}>=5,100,0.01)))
>         prior({lag}, index(0.2,0.2,0.2,0.2,0.2)):
>         regress lcoal L(1/5).lcoal

note: operator L1. is replaced with L. in parameter name L1.lcoal.

Burn-in ...
note: invalid initial state.
Simulation ...

Model summary

Likelihood:
lcoal ~ regress(xb_lcoal,{sigma2})

Priors:
{lcoal:L(1 2 3 4 5).lcoal} ~ normal(0,cond({lag}>=1,100,0.01))           (1)
{lcoal:_cons} ~ normal(0,10000)                             (1)
{sigma2} ~ igamma(.01,.01)

Hyperprior
{lag} ~ index(0.2,0.2,0.2,0.2,0.2)

(1) Parameters are elements of the linear form xb_lcoal.
Bayesian linear regression                       MCMC iterations  =     12,500
Random-walk Metropolis—Hastings sampling         Burn-in          =      2,500
MCMC sample size =     10,000
Number of obs    =        103
Acceptance rate  =        .34
Efficiency:  min =    .002852
avg =     .04431
Log marginal-likelihood = -8.2084752                          max =      .1716

Equal-tailed
Mean   Std. dev.     MCSE     Median  [95% cred. interval]

lcoal
lcoal
L1.    .2062446   .0784492   .011311   .2050062   .0487352   .3605725
L2.   -.0738366   .0588681   .002764  -.0739381  -.1877364   .0391768
L3.     .100462   .0597828   .004398   .1003963  -.0142032   .2216838
L4.    .7994076   .0606384   .006607   .8031808   .6651497    .910174
L5.   -.0729926   .0698683   .009211  -.0708155  -.2074388    .060126

_cons   -.1401982   .0812334   .015212  -.1438271  -.2877263   .0403175

sigma2    .0343128   .0051157   .000123   .0338508   .0256253   .0456132
lag      4.0194   .1379331   .004424          4          4          4
Note: Default priors are used for some model parameters.
Note: There is a high autocorrelation after 500 lags.

The estimated lag is 4, which is consistent with our findings based on posterior probabilities of the considered AR models in the previous section.

### Logistic regression with perfect predictors

Bayesian regression models can be useful in the presence of perfect predictors. Suppose that we want to model the binary outcome disease, the presence of a heart disease, as a function of a number of covariates: age, gender, indicator for fasting blood sugar greater than 120 mg/dl (variable isfbs), and categories for resting electrocardiographic results (variable restecg). This example is described in detail in Logistic regression with perfect predictors of [BAYES] bayes. The data are a sample from Switzerland.

Let's fit the standard logistic model first.

. logit disease restecg isfbs age male

note: restecg != 0 predicts success perfectly;
restecg omitted and 17 obs not used.

note: isfbs != 0 predicts success perfectly;
isfbs omitted and 3 obs not used.

note: male != 1 predicts success perfectly;
male omitted and 2 obs not used.

Iteration 0:   log likelihood = -4.2386144
Iteration 1:   log likelihood = -4.2358116
Iteration 2:   log likelihood = -4.2358076
Iteration 3:   log likelihood = -4.2358076

Logistic regression                                     Number of obs =     26
LR chi2(1)    =   0.01
Prob > chi2   = 0.9403
Log likelihood = -4.2358076                             Pseudo R2     = 0.0007

disease   Coefficient  Std. err.      z    P>|z|     [95% conf. interval]

restecg            0  (omitted)
isfbs            0  (omitted)
age    -.0097846   .1313502    -0.07   0.941    -.2672263    .2476572
male            0  (omitted)
_cons     3.763893   7.423076     0.51   0.612    -10.78507    18.31285



Variables restecg, isfbs, and male are omitted from the model because they predict the success probability perfectly for these data. In other words, the corresponding coefficients cannot be identified based on the observed data. The problem of perfect predictors often arises in binary-outcome models, especially with small datasets as in our example.

Sometimes, we may have reliable external information about the parameters of interest that can help identify them when the data alone cannot. For example, a similar heart study was conducted in Hungary. The estimates of the parameters from that study may be used to form the informative prior distributions for the parameters of the Swiss study.

We may consider a Bayesian logistic regression with the following informative priors.

. bayes, prior({disease:restecg age}, normal(0,10))
>        prior({disease:isfbs male}, normal(1,10))
>        prior({disease:_cons}, normal(-4,10)) nomleinitial:
>        logit disease restecg isfbs age male, asis

Burn-in ...
Simulation ...

Model summary

Likelihood:
disease ~ logit(xb_disease)

Priors:
{disease:restecg age} ~ normal(0,10)                                     (1)
{disease:isfbs male} ~ normal(1,10)                                     (1)
{disease:_cons} ~ normal(-4,10)                                    (1)

(1) Parameters are elements of the linear form xb_disease.
Bayesian logistic regression                     MCMC iterations  =     12,500
Random-walk Metropolis—Hastings sampling         Burn-in          =      2,500
MCMC sample size =     10,000
Number of obs    =         48
Acceptance rate  =      .2121
Efficiency:  min =     .01885
avg =     .04328
Log marginal-likelihood = -11.006071                          max =     .06184

Equal-tailed
disease        Mean   Std. dev.     MCSE     Median  [95% cred. interval]

restecg    1.965122   2.315475   .115615   1.655961  -2.029873   6.789415
isfbs    1.708631   2.726071   .113734   1.607439  -3.306837   7.334592
age    .1258811   .0707431   .003621   .1245266  -.0016807   .2719748
male    .2671381   2.237349   .162967   .3318061  -4.106425   4.609955
_cons   -2.441911   2.750613   .110611  -2.538183  -7.596747   3.185172



Whenever informative priors are used, the sensitivity of the results to the choice of the priors must be explored. For example, it will be useful to consider other heart disease studies to verify the reasonableness of the prior specifications and to use other priors to check the sensitivity of the estimated parameters to the choice of priors.

### GLM

Just like with other models, to fit Bayesian generalized linear models, we can simply prefix Stata's glm command with bayes:.

Example 2 in [R] glm analyzes data from an insecticide experiment. The effect of the dose of insecticide on the number of flour beetles killed is of interest. Variable ldose records the log dose of insecticide; n, the number of flour beetles subjected to each dose; and r, the number killed. Two models are considered: binomial regression with a logit link and with a complementary log-log link. Below we fit Bayesian analogs of these models and compare them using the deviance information criterion (DIC).

We fit Bayesian binomial regression with a logit link by typing

. bayes: glm r ldose, family(binomial n)

Burn-in ...
Simulation ...

Model summary

Likelihood:
r ~ glm(xb_r)

Prior:
{r:ldose _cons} ~ normal(0,10000)                                        (1)

(1) Parameters are elements of the linear form xb_r.
Bayesian generalized linear models               MCMC iterations  =     12,500
Random-walk Metropolis—Hastings sampling         Burn-in          =      2,500
MCMC sample size =     10,000
Family: binomial n                               Number of obs    =          8
Link:   logit                                    Scale parameter  =          1
Acceptance rate  =      .1777
Efficiency:  min =     .09582
avg =     .09593
Log marginal-likelihood = -29.167913                          max =     .09604

Equal-tailed
r        Mean   Std. dev.     MCSE     Median  [95% cred. interval]

ldose    34.48517   2.816542   .090987   34.43137   29.12495   40.06385
_cons   -61.09645   5.005658   .161527   -61.0278  -71.01371  -51.59132
Note: Default priors are used for model parameters.

We save the MCMC results and store the estimation results for later model comparison.

. bayes, saving(logit_mcmc)

. estimates store logit


We fit Bayesian binomial regression with a complementary log-log link as follows. This time, we save the MCMC results during estimation.

. bayes, saving(cloglog_mcmc): glm r ldose, family(binomial n) link(cloglog)

Burn-in ...
Simulation ...

file cloglog_mcmc.dta saved.

Model summary

Likelihood:
r ~ glm(xb_r)

Priors:
{r:ldose _cons} ~ normal(0,10000)                                        (1)

(1) Parameters are elements of the linear form xb_r.
Bayesian generalized linear models               MCMC iterations  =     12,500
Random-walk Metropolis—Hastings sampling         Burn-in          =      2,500
MCMC sample size =     10,000
Family: binomial n                               Number of obs    =          8
Link:   complementary log-log                    Scale parameter  =          1
Acceptance rate  =      .2069
Efficiency:  min =      .1264
avg =      .1265
Log marginal-likelihood = -26.111739                          max =      .1265

Equal-tailed
r        Mean   Std. dev.     MCSE     Median  [95% cred. interval]

ldose    22.17523   1.750735   .049226   22.15812   18.93784   25.68367
_cons   -39.81636   3.152542   .088657  -39.78953  -46.13103  -33.98554
Note: Default priors are used for model parameters.

. estimates store cloglog


We now use bayesstats ic ([BAYES] bayesstats ic) to compare the models.

. bayesstats ic logit cloglog

Bayesian information criteria

DIC    log(ML)    log(BF)

logit    41.26855  -29.16791          .
cloglog    33.51447  -26.11174   3.056174
Note: Marginal likelihood (ML) is computed
using Laplace-Metropolis approximation.


The complementary log-log model is preferable because its DIC value is smaller.

### References

Congdon, P. 2003. Applied Bayesian Modeling. New York: John Wiley & Sons.

Harvey, A. C. 1989. Forecasting, Structural Time Series Models, and the Kalman Filter. Cambridge: Cambridge University Press.

Mortimore, P., P. Sammons, L. Stoll, D. Lewis, and R. Ecob. 1988. School Matters: The Junior Years. Wells, Somerset, UK: Open Books.