Bayesian regression models using the bayes prefix

Order

Watch video demo

<- See Stata's other features

Highlights

Simply prefix your command with bayes:
Over 60 likelihood models supported
Linear, binary, ordinal, ...
Count, zero inflated
GLM, survival, multivariate
Censoring, truncation, sample selection
Quantile regression New
Panel-data models
Multilevel models
Time-series operators
Multivariate time-series models
Full Bayesian-features support

Fitting Bayesian regression models can be just as intuitive as performing Bayesian inference—introducing the bayes prefix in Stata. The bayes prefix combines Bayesian features with Stata's intuitive and elegant specification of regression models. It lets you fit Bayesian regression models more easily and fit more models.

You fit linear regression by using

. regress y x1 x2

You can fit Bayesian linear regression by simply using

. bayes: regress y x1 x2

You can also fit a Bayesian survival model by simply using

. bayes: streg x1 x2, distribution(weibull)

You can use the bayes prefix with many more regression models, including logistic, ordered probit, multinomial logistic, Poisson, generalized linear, conditional logistic, zero-inflated, sample-selection, and more. See the full list of supported estimation commands. Multilevel models are among the supported models too! See Bayesian multilevel models for details.

All of Stata's Bayesian features are supported by the bayes prefix. You can select from many prior distributions for model parameters or use default priors or even define your own priors. You can use the default adaptive Metropolis–Hastings sampling, or Gibbs sampling, or a combination of the two sampling methods. And you can use any other feature included in the bayesmh command ([BAYES] bayesmh).

After estimation, you can use Stata's standard Bayesian postestimation tools such as

bayesgraph to check convergence,
bayesstats summary to estimate functions of model parameters,
bayesstats ic and bayestest model to compare Bayesian models, and
bayestest interval to perform interval hypotheses testing.

Let's see it work

Let's now see the bayes prefix in action.

Linear regression

Estimation
MCMC convergence and hypotheses testing
Gibbs sampling
Custom priors

Autoregressive models

AR(1) model
Model comparison of AR(p) models
Choosing autocorrelation lag automatically

Logistic regression with perfect predictors
GLM
Quantile regression New
Survival models
Sample-selection models
Multilevel models

Linear regression

Estimation

Consider data on math scores of pupils in the third and fifth years from different schools in Inner London (Mortimore et al. 1988). We want to fit a linear regression of five-year math scores (math5) on three-year math scores (math3). Here we ignore potential school effects; see Bayesian two-level models for analyses that account for them.

We fit a linear regression by typing

. regress math5 math3


      Source         SS           df       MS      Number of obs   =       887
   F(1, 885)       =    341.40
       Model    10960.2737         1  10960.2737    Prob > F        =    0.0000
    Residual    28411.6181       885  32.1035233    R-squared       =    0.2784
   Adj R-squared   =    0.2776
       Total    39371.8918       886  44.4378011    Root MSE        =     5.666



       math5   Coefficient  Std. err.      t    P>|t|     [95% conf. interval]

       math3     .6081306   .0329126    18.48   0.000     .5435347    .6727265
       _cons     30.34656   .1906157   159.20   0.000     29.97245    30.72067

To fit a Bayesian linear regression, we simply prefix the above regress command with bayes:.

. bayes: regress math5 math3

Burn-in ...
Simulation ...


Model summary 

Likelihood:         
  math5 ~ regress(xb_math5,{sigma2}) 
             
Priors:             
  {math5:math3 _cons} ~ normal(0,10000)                                    (1)
             {sigma2} ~ igamma(.01,.01)                                       

(1) Parameters are elements of the linear form xb_math5.
Bayesian linear regression                       MCMC iterations  =     12,500
Random-walk Metropolis—Hastings sampling         Burn-in          =      2,500
                                                 MCMC sample size =     10,000
                                                 Number of obs    =        887
                                                 Acceptance rate  =      .3312
                                                 Efficiency:  min =      .1099
                                                              avg =      .1529
Log marginal-likelihood = -2817.2335                          max =      .2356



                                                              Equal-tailed    
                    Mean   Std. dev.     MCSE     Median  [95% cred. interval]

math5          
       math3    .6070097   .0323707   .000976   .6060445   .5440594   .6706959
       _cons     30.3462   .1903067   .005658   30.34904   29.97555   30.71209

      sigma2    32.17492   1.538155   .031688    32.0985    29.3045   35.38031
Note: Default priors are used for model parameters.

The results are similar between the two commands, but the interpretations are different. Recall that Bayesian models assume that all parameters are random. So a 95% credible interval of [0.54,0.67] for the slope coefficient on third-year math scores {math5:math3} implies that the probability that {math5:math3} is between 0.54 and 0.67 is 0.95. Such probabilistic interpretation is not appropriate for the confidence interval from regress, although the confidence interval does suggest a similar plausible range for {math5:math3}.

Also, for each parameter, regress produced just one estimate, whereas bayes: regress produced 10,000 MCMC estimates. MCMC estimates are simulated from a posterior distribution of model parameters, after discarding the first 2,500 estimates for the burn-in period. What is reported in the output are the summaries of the marginal posterior distributions of the parameters such as posterior means and posterior standard deviations.

The results will not always be similar between the two commands. They are similar in this example because the default priors the bayes prefix used are not informative for these data. The priors used are described in the model summary. They are normal with mean 0 and variance of 10,000 for the regression coefficients and inverse-gamma with shape and scale parameters of 0.01 for the error variance. Prior distributions are important for Bayesian models, and we will discuss them later in Custom priors.

We can use any feature of the bayesmh command, which fits general Bayesian models, with the bayes prefix. For example, we can display 90% highest posterior density (HPD) credible intervals instead of the default equal-tailed intervals on replay. We do this with bayes just like we do this with bayesmh:

. bayes, hpd clevel(90)


Model summary 

Likelihood:         
  math5 ~ regress(xb_math5,{sigma2}) 
             
Priors:             
  {math5:math3 _cons} ~ normal(0,10000)                                    (1)
             {sigma2} ~ igamma(.01,.01)                                       

(1) Parameters are elements of the linear form xb_math5.
Bayesian linear regression                       MCMC iterations  =     12,500
Random-walk Metropolis—Hastings sampling         Burn-in          =      2,500
                                                 MCMC sample size =     10,000
                                                 Number of obs    =        887
                                                 Acceptance rate  =      .3312
                                                 Efficiency:  min =      .1099
                                                              avg =      .1529
Log marginal-likelihood = -2817.2335                          max =      .2356



                                                                   HPD        
                    Mean   Std. dev.     MCSE     Median  [90% cred. interval]

math5          
       math3    .6070097   .0323707   .000976   .6060445   .5580554   .6634722
       _cons     30.3462   .1903067   .005658   30.34904   30.05275   30.66751

      sigma2    32.17492   1.538155   .031688    32.0985    29.6363   34.66541
Note: Default priors are used for model parameters.

MCMC convergence and hypotheses testing

We can use any of Bayesian postestimation features (except bayespredict) after the bayes prefix.

For example, we can check MCMC convergence for {math5:math3} using bayesgraph diagnostics ([BAYES] bayesgraph).

. bayesgraph diagnostics {math5:math3}

Or we can test interval hypotheses using bayestest interval ([BAYES] bayestest interval):

. bayestest interval {math5:math3}, lower(0.5) 
     upper(0.6)

Interval tests     MCMC sample size =    10,000

       prob1 : 0.5 < {math5:math3} < 0.6




                    Mean    Std. dev.      MCSE

       prob1         .42     0.49358   .0160684

The probability that the coefficient {math5:math3} lies between 0.5 and 0.6 is .42.

Also see Multiple chains and Gelman–Rubin convergence diagnostic for investigating convergence using multiple chains.

Gibbs sampling

The default sampling algorithm used by the bayes prefix with most of the estimation commands is an adaptive Metropolis–Hastings algorithm. For some prior and likelihood model combinations, an alternative Gibbs sampling algorithm may be available. Gibbs sampling is generally slower but more efficient than the Metropolis–Hastings sampling. By "more efficient", we mean that you will get more precise estimates from Gibbs sampling for the same number of iterations.

We can specify the gibbs option with bayes for linear regression to perform Gibbs sampling.

. bayes, gibbs: regress math5 math3

Burn-in ...
Simulation ...


Model summary 

Likelihood:         
  math5 ~ normal(xb_math5,{sigma2})
             
Priors:             
  {math5:math3 _cons} ~ normal(0,10000)                                    (1)
             {sigma2} ~ igamma(.01,.01)                                       

(1) Parameters are elements of the linear form xb_math5.
Bayesian linear regression                       MCMC iterations  =     12,500
Gibbs sampling                                   Burn-in          =      2,500
                                                 MCMC sample size =     10,000
                                                 Number of obs    =        887
                                                 Acceptance rate  =          1
                                                 Efficiency:  min =          1
                                                              avg =          1
Log marginal-likelihood =  -2817.184                          max =          1



                                                              Equal-tailed    
                    Mean   Std. dev.     MCSE     Median  [95% cred. interval]

math5          
       math3    .6085104   .0333499   .000333   .6087819   .5426468   .6731657
       _cons    30.34419   .1916673   .001917   30.34441   29.97587   30.72617

      sigma2    32.16765   1.551119   .015511   32.10778     29.238   35.29901
Note: Default priors are used for model parameters.

Notice that the minimum and maximum efficiencies reported in the header are equal to 1 for Gibbs sampling and are much higher than those from the adaptive Metropolis–Hastings sampling. Higher efficiencies mean smaller Monte Carlo standard errors (MCSEs) and thus more precise posterior mean estimates. In our example, there is no practical difference between the two sampling procedures.

bayes's gibbs option specifies full Gibbs sampling in which all model parameters are sampled using the Gibbs method. Full Gibbs sampling is not available for all likelihood models and prior combinations. In fact, only bayes: regress and bayes: mvreg support this option. However, Gibbs sampling may be available for some model parameters, in which case you can use a hybrid sampling in which some parameters are sampled using the Gibbs method and the others using the Metropolis–Hastings method; see Gibbs and hybrid MH sampling in [BAYES] bayesmh for details.

Custom priors

As we mentioned earlier, a prior distribution is an important component of a Bayesian model and may have a large impact on your results. It summarizes your knowledge about a model parameter before you see the data. After the data are observed, the prior distribution is updated with the information from the observed data to form the posterior distribution of the parameter. The posterior distribution is then used for Bayesian inference.

Your science or previous studies will often suggest the appropriate prior distributions for your model parameters. In the absence of such knowledge, noninformative priors are recommended. The choice of a prior distribution becomes especially important with small datasets, which typically contain less information about the model parameters.

For convenience, the bayes prefix provides default priors, but you can also specify your own. The default priors are chosen to be fairly noninformative for moderately scaled model parameters but may become informative for large values of parameters.

The bayes prefix provides convenient options to modify the values of the default priors. For example, bayes: regress uses normal priors with mean 0 and variance of 10,000 for the regression coefficients and an inverse-gamma prior with shape and scale parameters of 0.01 for the error variance. We can change the default values by using the normalprior() and igammaprior() options.

. bayes, normalprior(10) igammaprior(1 2): regress math5 math3

Burn-in ...
Simulation ...


Model summary 

Likelihood:         
  math5 ~ regress(xb_math5,{sigma2}) 
             
Priors:             
  {math5:math3 _cons} ~ normal(0,100)                                      (1)
             {sigma2} ~ igamma(1,2)                                           

(1) Parameters are elements of the linear form xb_math5.
Bayesian linear regression                       MCMC iterations  =     12,500
Random-walk Metropolis—Hastings sampling         Burn-in          =      2,500
                                                 MCMC sample size =     10,000
                                                 Number of obs    =        887
                                                 Acceptance rate  =      .3503
                                                 Efficiency:  min =      .1189
                                                              avg =      .1471
Log marginal-likelihood = -2815.3081                          max =      .2005



                                                              Equal-tailed    
                    Mean   Std. dev.     MCSE     Median  [95% cred. interval]

math5          
       math3    .6076875    .033088   .000948   .6076282   .5405233    .673638
       _cons      30.326   .1931568   .005602   30.32804   29.93212   30.70529

      sigma2    32.09694   1.530839   .034185   32.03379   29.27687   35.37723
Note: Default priors are used for model parameters.

In the above, we used a standard deviation of 10 for the normal priors and the shape of 1 and scale of 2 for the inverse-gamma prior.

We can specify custom priors for some of the parameters and leave the priors for other parameters at their defaults. We can also use prior distributions other than the default. For example, below we use a uniform on (-50,50) prior only for the regression coefficients.

. bayes, prior({math5:}, uniform(-50,50)): regress math5 math3

Burn-in ...
Simulation ...


Model summary 

Likelihood:         
  math5 ~ regress(xb_math5,{sigma2}) 
             
Priors:             
  {math5:math3 _cons} ~ uniform(-50,50)                                    (1)
             {sigma2} ~ igamma(.01,.01)                                       

(1) Parameters are elements of the linear form xb_math5.
Bayesian linear regression                       MCMC iterations  =     12,500
Random-walk Metropolis—Hastings sampling         Burn-in          =      2,500
                                                 MCMC sample size =     10,000
                                                 Number of obs    =        887
                                                 Acceptance rate  =      .3492
                                                 Efficiency:  min =      .1222
                                                              avg =      .1633
Log marginal-likelihood = -2815.3228                          max =       .241



                                                              Equal-tailed    
                    Mean   Std. dev.     MCSE     Median  [95% cred. interval]

math5          
       math3    .6076356   .0328739    .00094   .6056774   .5433171   .6759952
       _cons    30.33498   .1908214   .005361   30.33159   29.96478   30.70833

      sigma2    32.15793   1.555689   .031691   32.08331     29.308   35.35561
Note: Default priors are used for model parameters.

{math5:} in the above prior() option is a shortcut for referring to all regression coefficients associated with the dependent variable math5: {math5:math3} and {math5:_cons}.

Or we can use custom priors for all model parameters.

. bayes, prior({math5:_cons}, uniform(-50,50)) prior({math5:math3}, 
     uniform(-5,5)) prior({sigma2}, jeffreys): regress math5 math3

Burn-in ...
Simulation ...


Model summary 

Likelihood:         
  math5 ~ regress(xb_math5,{sigma2}) 
             
Priors:             
  {math5:_cons} ~ uniform(-50,50)                                          (1)
  {math5:math3} ~ uniform(-5,5)                                            (1)
       {sigma2} ~ jeffreys                                                    

(1) Parameters are elements of the linear form xb_math5.
Bayesian linear regression                       MCMC iterations  =     12,500
Random-walk MetropolisHastings sampling          Burn-in          =      2,500
                                                 MCMC sample size =     10,000
                                                 Number of obs    =        887
                                                 Acceptance rate  =      .3678
                                                 Efficiency:  min =      .1311
                                                              avg =      .1617
Log marginal-likelihood = -2808.3156                          max =      .2184



                                                              Equal-tailed    
                    Mean   Std. dev.     MCSE     Median  [95% cred. interval]

math5          
       math3    .6067153   .0333801   .000906    .606437   .5417588   .6723998
       _cons    30.34962   .1935328   .005346   30.35526   29.97142   30.72829

      sigma2    32.17297   1.546291   .033087    32.1054   29.34673   35.36544

Autoregressive models

AR(1) model

The bayes: regress specification is convenient, but we could already use bayesmh to fit a linear regression. What we cannot do when using bayesmh, for example, is fit an autoregressive model. We can use bayes: regress to do that.

Consider quarterly data on coal consumption in the United Kingdom from 1960 to 1986 (for example, Congdon [2003] and Harvey [1989]). The outcome of interest is coal consumption (in millions of tons) in a given year and quarter. Variable lcoal is transformed using log(coal/1000).

Let's fit a Bayesian autoregressive model with one lag, AR(1), to these data.

. bayes: regress lcoal L.lcoal

Burn-in ...
Simulation ...


Model summary 

Likelihood:         
  lcoal ~ regress(xb_lcoal,{sigma2}) 
             
Priors:             
  {lcoal:L.lcoal _cons} ~ normal(0,10000)                                  (1)
               {sigma2} ~ igamma(.01,.01)                                     

(1) Parameters are elements of the linear form xb_lcoal.
Bayesian linear regression                       MCMC iterations  =     12,500
Random-walk Metropolis—Hastings sampling         Burn-in          =      2,500
                                                 MCMC sample size =     10,000
                                                 Number of obs    =        107
                                                 Acceptance rate  =      .3285
                                                 Efficiency:  min =      .1199
                                                              avg =      .1448
Log marginal-likelihood = -75.889709                          max =      .1905



                                                              Equal-tailed    
                    Mean   Std. dev.     MCSE     Median  [95% cred. interval]

lcoal          
       lcoal   
         L1.    .7143121   .0649968   .001877   .7123857   .5884089   .8436602
              
       _cons   -.6896604   .1561023   .004433  -.6935272  -.9970502  -.3879924

      sigma2    .1702592   .0243144   .000557   .1672834   .1299619   .2248287
Note: Default priors are used for model parameters.

We used Stata's time-series lag operator L. to include the first lag of our dependent variable lcoal in the regression model.

We save MCMC estimates and store estimation results from our Bayesian AR(1) model for later comparison with other AR models.

. bayes, saving(lag1_mcmc)

. estimates store lag1

The stationarity assumption of an AR(1) model requires that the first lag coefficient, {lcoal:L.lcoal}, is between -1 and 1. We can use a prior distribution to incorporate this assumption in our Bayesian model. For example, we can specify a uniform on (-1,1) prior for {lcoal:L.lcoal}.

. bayes, prior({lcoal:L.lcoal}, uniform(-1,1)): regress lcoal L.lcoal

Burn-in ...
Simulation ...


Model summary 

Likelihood:         
  lcoal ~ regress(xb_lcoal,{sigma2}) 
             
Priors:             
  {lcoal:L.lcoal} ~ uniform(-1,1)                                          (1)
    {lcoal:_cons} ~ normal(0,10000)                                        (1)
         {sigma2} ~ igamma(.01,.01)                                           

(1) Parameters are elements of the linear form xb_lcoal.
Bayesian linear regression                       MCMC iterations  =     12,500
Random-walk Metropolis—Hastings sampling         Burn-in          =      2,500
                                                 MCMC sample size =     10,000
                                                 Number of obs    =        107
                                                 Acceptance rate  =      .4379
                                                 Efficiency:  min =    .006311
                                                              avg =     .06198
Log marginal-likelihood = -70.952743                          max =      .1731




                                                              Equal-tailed    
                    Mean   Std. dev.     MCSE     Median  [95% cred. interval]

lcoal          
       lcoal   
         L1.    .7111236   .0716918   .008876   .7085994   .5811546   .8647907
               
       _cons   -.6964178   .1747408   .021995  -.7047339  -1.013976  -.3139992

      sigma2    .1709232   .0240107   .000577   .1695033   .1296427   .2222473
Note: Default priors are used for some model parameters.

The specified uniform prior has little impact on our results.

Model comparison of AR(p) models

In the previous section, we fit an AR(1) model to the coal consumption data. Is the AR(1) model sufficient for these data? Do we need a higher-order model such as AR(2) or AR(3)? We can use Bayesian model selection to answer this question.

We will use bayestest model ([BAYES] bayestest model) to compare different AR models using model posterior probabilities. To use bayestest model, we need to fit each model of interest separately and store its estimation results. For brevity, we omit the outputs from the fitted models.

We fit the AR(2) model by including the second lag L2.lcoal of the dependent variable in the regression model. We use bayes's saving() option during estimation to save MCMC estimates in the Stata dataset lag2_mcmc.dta. This dataset is part of Bayesian estimation results, and it must be saved before estimates store can be used. We then use estimates store to store the standard estimation results.

. bayes, saving(lag2_mcmc): regress lcoal L.lcoal L2.lcoal

. estimates store lag2

We repeat the above steps for the AR(3), AR(4), and AR(5) models.

. bayes, saving(lag3_mcmc): regress lcoal L(1/3).lcoal

. estimates store lag3

. bayes, saving(lag4_mcmc): regress lcoal L(1/4).lcoal

. estimates store lag4

. bayes, saving(lag5_mcmc): regress lcoal L(1/5).lcoal

. estimates store lag5

We then use bayestest model to compare the models.

. bayestest model lag1 lag2 lag3 lag4 lag5

Bayesian model tests



                 log(ML)       P(M)     P(M|y)

 lag1    -75.8897     0.2000     0.0000
 lag2    -82.5078     0.2000     0.0000
 lag3    -59.6688     0.2000     0.0000
 lag4    -13.8944     0.2000     0.9990
 lag5    -20.8194     0.2000     0.0010
Note: Marginal likelihood (ML) is computed using
      Laplace-Metropolis approximation.

Given equal prior probabilities for all five AR models, the AR(4) model has the highest posterior probability of 0.9990.

Given that our data are quarterly, it is not surprising that the fourth lag is so important. It is comforting that our data confirm that importance.

Choosing autocorrelation lag automatically

Editors note: This section uses uniquely Bayesian thinking to address the problem of model selection. It is a wonderful solution, but be forewarned that several frequentists at StataCorp are still scratching their heads. You can skip the section if you wish.

In the previous section, we used model posterior probabilities to choose the autocorrelation lag. Instead, we can extend our Bayesian autoregressive model to estimate the lag. Let the autocorrelation lag be another parameter in the model. Suppose that it can take any integer value between 1 and 5. Make prior distributions of the lag coefficients depend on the lag. If the estimated lag parameter is greater or equal to p, assign the pth lag coefficient a normal prior with mean 0 and a variance of 100. Otherwise, assign it a normal prior with a mean 0 and a variance of 0.01. In other words, if the estimated lag parameter is equal to p, we shrink all of the lag coefficients corresponding to lags greater than p toward zero.

The arguments of the prior distributions in the prior() option can be specified as expressions. We use this feature to accommodate the above priors for the lag coefficients,

. bayes, prior({lcoal:L1.lcoal}, normal(0, cond({lag}>=1,100,0.01))) ...

where {lag} is the autocorrelation lag parameter.

Our final model specification is as follows, in which {lag} is assigned a discrete prior with equal probabilities for each of the possible values 1, 2, 3, 4, and 5.

. bayes,  prior({lcoal:L1.lcoal}, normal(0, cond({lag}>=1,100,0.01)))
     prior({lcoal:L2.lcoal}, normal(0, cond({lag}>=2,100,0.01)))
     prior({lcoal:L3.lcoal}, normal(0, cond({lag}>=3,100,0.01)))
     prior({lcoal:L4.lcoal}, normal(0, cond({lag}>=4,100,0.01)))
     prior({lcoal:L5.lcoal}, normal(0, cond({lag}>=5,100,0.01)))
     prior({lag}, index(0.2,0.2,0.2,0.2,0.2)):
     regress lcoal L(1/5).lcoal

note: operator L1. is replaced with L. in parameter name L1.lcoal.
  
Burn-in ...
note: invalid initial state.
Simulation ...


Model summary 

Likelihood:         
  lcoal ~ regress(xb_lcoal,{sigma2}) 
             
Priors:             
  {lcoal:L(1 2 3 4 5).lcoal} ~ normal(0,cond({lag}>=1,100,0.01))           (1)
               {lcoal:_cons} ~ normal(0,10000)                             (1) 
                    {sigma2} ~ igamma(.01,.01)                                
             
Hyperprior          
  {lag} ~ index(0.2,0.2,0.2,0.2,0.2)                                          

(1) Parameters are elements of the linear form xb_lcoal.
Bayesian linear regression                       MCMC iterations  =     12,500
Random-walk Metropolis—Hastings sampling         Burn-in          =      2,500
                                                 MCMC sample size =     10,000
                                                 Number of obs    =        103
                                                 Acceptance rate  =        .34
                                                 Efficiency:  min =    .002852
                                                              avg =     .04431
Log marginal-likelihood = -8.2084752                          max =      .1716



                                                              Equal-tailed    
                    Mean   Std. dev.     MCSE     Median  [95% cred. interval]

lcoal            
       lcoal   
         L1.    .2062446   .0784492   .011311   .2050062   .0487352   .3605725
         L2.   -.0738366   .0588681   .002764  -.0739381  -.1877364   .0391768
         L3.     .100462   .0597828   .004398   .1003963  -.0142032   .2216838
         L4.    .7994076   .0606384   .006607   .8031808   .6651497    .910174
         L5.   -.0729926   .0698683   .009211  -.0708155  -.2074388    .060126
               
       _cons   -.1401982   .0812334   .015212  -.1438271  -.2877263   .0403175

      sigma2    .0343128   .0051157   .000123   .0338508   .0256253   .0456132
         lag      4.0194   .1379331   .004424          4          4          4
Note: Default priors are used for some model parameters.
Note: There is a high autocorrelation after 500 lags.

The estimated lag is 4, which is consistent with our findings based on posterior probabilities of the considered AR models in the previous section.

Logistic regression with perfect predictors

Bayesian regression models can be useful in the presence of perfect predictors. Suppose that we want to model the binary outcome disease, the presence of a heart disease, as a function of a number of covariates: age, gender, indicator for fasting blood sugar greater than 120 mg/dl (variable isfbs), and categories for resting electrocardiographic results (variable restecg). This example is described in detail in Logistic regression with perfect predictors of [BAYES] bayes. The data are a sample from Switzerland.

Let's fit the standard logistic model first.

. logit disease restecg isfbs age male

note: restecg != 0 predicts success perfectly;
      restecg omitted and 17 obs not used.

note: isfbs != 0 predicts success perfectly;
      isfbs omitted and 3 obs not used.

note: male != 1 predicts success perfectly;
      male omitted and 2 obs not used.

Iteration 0:  Log likelihood = -4.2386144  
Iteration 1:  Log likelihood = -4.2358116  
Iteration 2:  Log likelihood = -4.2358076  
Iteration 3:  Log likelihood = -4.2358076  

Logistic regression                                     Number of obs =     26
                                                        LR chi2(1)    =   0.01
                                                        Prob > chi2   = 0.9403
Log likelihood = -4.2358076                             Pseudo R2     = 0.0007



     disease   Coefficient  Std. err.      z    P>|z|     [95% conf. interval]

     restecg            0  (omitted)                                          
       isfbs            0  (omitted)                                          
         age    -.0097846   .1313502    -0.07   0.941    -.2672263    .2476572
        male            0  (omitted)                                          
       _cons     3.763893   7.423076     0.51   0.612    -10.78507    18.31285

Variables restecg, isfbs, and male are omitted from the model because they predict the success probability perfectly for these data. In other words, the corresponding coefficients cannot be identified based on the observed data. The problem of perfect predictors often arises in binary-outcome models, especially with small datasets as in our example.

Sometimes, we may have reliable external information about the parameters of interest that can help identify them when the data alone cannot. For example, a similar heart study was conducted in Hungary. The estimates of the parameters from that study may be used to form the informative prior distributions for the parameters of the Swiss study.

We may consider a Bayesian logistic regression with the following informative priors.

. bayes, prior({disease:restecg age}, normal(0,10))
     prior({disease:isfbs male}, normal(1,10))
     prior({disease:_cons}, normal(-4,10)) nomleinitial:
     logit disease restecg isfbs age male, asis

Burn-in ...
Simulation ...


Model summary 

Likelihood:         
  disease ~ logit(xb_disease)        
             
Priors:             
  {disease:restecg age} ~ normal(0,10)                                     (1)
   {disease:isfbs male} ~ normal(1,10)                                     (1)
        {disease:_cons} ~ normal(-4,10)                                    (1)

(1) Parameters are elements of the linear form xb_disease.
Bayesian logistic regression                     MCMC iterations  =     12,500
Random-walk Metropolis—Hastings sampling         Burn-in          =      2,500
                                                 MCMC sample size =     10,000
                                                 Number of obs    =         48
                                                 Acceptance rate  =      .2121
                                                 Efficiency:  min =     .01885
                                                              avg =     .04328
Log marginal-likelihood = -11.006071                          max =     .06184



                                                              Equal-tailed    
     disease        Mean   Std. dev.     MCSE     Median  [95% cred. interval]

     restecg    1.965122   2.315475   .115615   1.655961  -2.029873   6.789415
       isfbs    1.708631   2.726071   .113734   1.607439  -3.306837   7.334592
         age    .1258811   .0707431   .003621   .1245266  -.0016807   .2719748
        male    .2671381   2.237349   .162967   .3318061  -4.106425   4.609955
       _cons   -2.441911   2.750613   .110611  -2.538183  -7.596747   3.185172

Whenever informative priors are used, the sensitivity of the results to the choice of the priors must be explored. For example, it will be useful to consider other heart disease studies to verify the reasonableness of the prior specifications and to use other priors to check the sensitivity of the estimated parameters to the choice of priors.

GLM

Just like with other models, to fit Bayesian generalized linear models, we can simply prefix Stata's glm command with bayes:.

Example 2 in [R] glm analyzes data from an insecticide experiment. The effect of the dose of insecticide on the number of flour beetles killed is of interest. Variable ldose records the log dose of insecticide; n, the number of flour beetles subjected to each dose; and r, the number killed. Two models are considered: binomial regression with a logit link and with a complementary log-log link. Below we fit Bayesian analogs of these models and compare them using the deviance information criterion (DIC).

We fit Bayesian binomial regression with a logit link by typing

. bayes: glm r ldose, family(binomial n)

Burn-in ...
Simulation ...


Model summary 

Likelihood:         
  r ~ glm(xb_r)                      
             
Prior:             
  {r:ldose _cons} ~ normal(0,10000)                                        (1)

(1) Parameters are elements of the linear form xb_r.
Bayesian generalized linear models               MCMC iterations  =     12,500
Random-walk Metropolis—Hastings sampling         Burn-in          =      2,500
                                                 MCMC sample size =     10,000
Family: binomial n                               Number of obs    =          8
Link:   logit                                    Scale parameter  =          1
                                                 Acceptance rate  =      .1777
                                                 Efficiency:  min =     .09582
                                                              avg =     .09593
Log marginal-likelihood = -29.167913                          max =     .09604



                                                              Equal-tailed    
           r        Mean   Std. dev.     MCSE     Median  [95% cred. interval]

       ldose    34.48517   2.816542   .090987   34.43137   29.12495   40.06385
       _cons   -61.09645   5.005658   .161527   -61.0278  -71.01371  -51.59132
Note: Default priors are used for model parameters.

We save the MCMC results and store the estimation results for later model comparison.

. bayes, saving(logit_mcmc)

. estimates store logit

We fit Bayesian binomial regression with a complementary log-log link as follows. This time, we save the MCMC results during estimation.

. bayes, saving(cloglog_mcmc): glm r ldose, family(binomial n) link(cloglog)

Burn-in ...
Simulation ...

file cloglog_mcmc.dta saved.


Model summary 

Likelihood:         
  r ~ glm(xb_r)                      
             
Priors:             
  {r:ldose _cons} ~ normal(0,10000)                                        (1)

(1) Parameters are elements of the linear form xb_r.
Bayesian generalized linear models               MCMC iterations  =     12,500
Random-walk Metropolis—Hastings sampling         Burn-in          =      2,500
                                                 MCMC sample size =     10,000
Family: binomial n                               Number of obs    =          8
Link:   complementary log-log                    Scale parameter  =          1
                                                 Acceptance rate  =      .2069
                                                 Efficiency:  min =      .1264
                                                              avg =      .1265
Log marginal-likelihood = -26.111739                          max =      .1265



                                                              Equal-tailed    
           r        Mean   Std. dev.     MCSE     Median  [95% cred. interval]

       ldose    22.17523   1.750735   .049226   22.15812   18.93784   25.68367
       _cons   -39.81636   3.152542   .088657  -39.78953  -46.13103  -33.98554
Note: Default priors are used for model parameters.

. estimates store cloglog

We now use bayesstats ic ([BAYES] bayesstats ic) to compare the models.

. bayesstats ic logit cloglog

Bayesian information criteria



                     DIC    log(ML)    log(BF)

       logit    41.26855  -29.16791          .
     cloglog    33.51447  -26.11174   3.056174
Note: Marginal likelihood (ML) is computed
      using Laplace-Metropolis approximation.

The complementary log-log model is preferable because its DIC value is smaller.

References

Congdon, P. 2003. Applied Bayesian Modeling. New York: John Wiley & Sons.

Harvey, A. C. 1989. Forecasting, Structural Time Series Models, and the Kalman Filter. Cambridge: Cambridge University Press.

Mortimore, P., P. Sammons, L. Stoll, D. Lewis, and R. Ecob. 1988. School Matters: The Junior Years. Wells, Somerset, UK: Open Books.

Tell me more

Learn more about Stata's Bayesian analysis features.

Learn more about Bayesian multilevel models, Bayesian panel-data models, Bayesian survival models, Bayesian sample-selection models, and Bayesian quantile regression.

Read more about the bayes prefix and Bayesian analysis in the Bayesian Analysis Reference Manual.

Products

New in Stata 19

Why Stata

All features

Disciplines

Stata/MP

StataNow

Order Stata

Purchase

Order Stata

Bookstore

Stata Press

Stata Journal

Gift Shop

Learn

Free webinars

NetCourses

Classroom and web training

Organizational training

Video tutorials

Third-party courses

Web resources

Teaching with Stata

Support

Training

Video tutorials

FAQs

Statalist: The Stata Forum

Resources

Technical support

Customer service

Alerts

Company

News and events

Customer service

Careers

We use cookies

We use cookies to ensure that we give you the best experience on our website—to enhance site navigation, to analyze usage, and to assist in our marketing efforts. By continuing to use our site, you consent to the storing of cookies on your device and agree to delivery of content, including web fonts and JavaScript, from third party web services.

Cookie Settings

Privacy policy

Last updated: 16 November 2022

StataCorp LLC (StataCorp) strives to provide our users with exceptional products and services. To do so, we must collect personal information from you. This information is necessary to conduct business with our existing and potential customers. We collect and use this information only where we may legally do so. This policy explains what personal information we collect, how we use it, and what rights you have to that information.

Required cookies

Advertising cookies

Required cookies

These cookies are essential for our website to function and do not store any personally identifiable information. These cookies cannot be disabled.
Advertising and performance cookies

This website uses cookies to provide you with a better user experience. A cookie is a small piece of data our website stores on a site visitor's hard drive and accesses each time you visit so we can improve your access to our site, better understand how you use our site, and serve you content that may be of interest to you. For instance, we store a cookie when you log in to our shopping cart so that we can maintain your shopping cart should you not complete checkout. These cookies do not directly store your personal information, but they do support the ability to uniquely identify your internet browser and device.

Please note: Clearing your browser cookies at any time will undo preferences saved here. The option selected here will apply only to the device you are currently using.

Accept Cookies

Source	SS df MS	Number of obs = 887
		F(1, 885) = 341.40
Model	10960.2737 1 10960.2737	Prob > F = 0.0000
Residual	28411.6181 885 32.1035233	R-squared = 0.2784
		Adj R-squared = 0.2776
Total	39371.8918 886 44.4378011	Root MSE = 5.666


math5		Coefficient Std. err. t P>\|t\| [95% conf. interval]

math3		.6081306 .0329126 18.48 0.000 .5435347 .6727265
_cons		30.34656 .1906157 159.20 0.000 29.97245 30.72067

Model summary

Likelihood:
math5 ~ regress(xb_math5,{sigma2})

Priors:
{math5:math3 _cons} ~ normal(0,10000) (1)
{sigma2} ~ igamma(.01,.01)

(1) Parameters are elements of the linear form xb_math5.


		Equal-tailed
		Mean Std. dev. MCSE Median [95% cred. interval]

math5
math3		.6070097 .0323707 .000976 .6060445 .5440594 .6706959
_cons		30.3462 .1903067 .005658 30.34904 29.97555 30.71209

sigma2		32.17492 1.538155 .031688 32.0985 29.3045 35.38031


		HPD
		Mean Std. dev. MCSE Median [90% cred. interval]

math5
math3		.6070097 .0323707 .000976 .6060445 .5580554 .6634722
_cons		30.3462 .1903067 .005658 30.34904 30.05275 30.66751

sigma2		32.17492 1.538155 .031688 32.0985 29.6363 34.66541

Model summary

Likelihood:
math5 ~ normal(xb_math5,{sigma2})

Priors:
{math5:math3 _cons} ~ normal(0,10000) (1)
{sigma2} ~ igamma(.01,.01)

(1) Parameters are elements of the linear form xb_math5.

Model summary

Likelihood:
lcoal ~ regress(xb_lcoal,{sigma2})

Priors:
{lcoal:L.lcoal _cons} ~ normal(0,10000) (1)
{sigma2} ~ igamma(.01,.01)

(1) Parameters are elements of the linear form xb_lcoal.


		Equal-tailed
		Mean Std. dev. MCSE Median [95% cred. interval]

lcoal
lcoal
L1.		.7143121 .0649968 .001877 .7123857 .5884089 .8436602

_cons		-.6896604 .1561023 .004433 -.6935272 -.9970502 -.3879924

sigma2		.1702592 .0243144 .000557 .1672834 .1299619 .2248287


		log(ML) P(M) P(M\|y)

lag1		-75.8897 0.2000 0.0000
lag2		-82.5078 0.2000 0.0000
lag3		-59.6688 0.2000 0.0000
lag4		-13.8944 0.2000 0.9990
lag5		-20.8194 0.2000 0.0010


disease		Coefficient Std. err. z P>\|z\| [95% conf. interval]

restecg		0 (omitted)
isfbs		0 (omitted)
age		-.0097846 .1313502 -0.07 0.941 -.2672263 .2476572
male		0 (omitted)
_cons		3.763893 7.423076 0.51 0.612 -10.78507 18.31285

Model summary

Likelihood:
disease ~ logit(xb_disease)

Priors:
{disease:restecg age} ~ normal(0,10) (1)
{disease:isfbs male} ~ normal(1,10) (1)
{disease:_cons} ~ normal(-4,10) (1)

(1) Parameters are elements of the linear form xb_disease.

Model summary

Likelihood:
r ~ glm(xb_r)

Prior:
{r:ldose _cons} ~ normal(0,10000) (1)

(1) Parameters are elements of the linear form xb_r.


		Equal-tailed
r		Mean Std. dev. MCSE Median [95% cred. interval]

ldose		34.48517 2.816542 .090987 34.43137 29.12495 40.06385
_cons		-61.09645 5.005658 .161527 -61.0278 -71.01371 -51.59132


		DIC log(ML) log(BF)

logit		41.26855 -29.16791 .
cloglog		33.51447 -26.11174 3.056174