Bayesian multilevel models

Order

<- See Stata's other features

Highlights

Outcomes: continuous, censored, binary, ordinal, count, GLM, survival
Random intercepts and coefficients
Nested and crossed effects
Multiple levels of hierarchy
Random-effects covariance structures
Full Bayesian analysis features support

Multilevel models are regression models that incorporate group-specific effects. Groups may represent different levels of hierarchy such as hospitals, doctors nested within hospitals, and patients nested within doctors nested within hospitals. Group-specific effects are assumed to vary randomly across groups according to some a priori distribution, commonly a normal distribution. This assumption makes multilevel models natural candidates for Bayesian analysis. Bayesian multilevel models additionally assume that other model parameters such as regression coefficients and variance components—variances of group-specific effects—are also random.

Why use Bayesian multilevel models? In addition to standard reasons for Bayesian analysis, Bayesian multilevel modeling is often used when the number of groups is small or in the presence of many hierarchical levels. Bayesian information criteria such as the deviance information criterion (DIC) are also popular for comparing multilevel models. When the comparison of groups is of main interest, Bayesian multilevel modeling can provide entire distributions of group-specific effects.

You can easily fit Bayesian multilevel models in Stata—just prefix your multilevel command with bayes:

. bayes: mixed y x1 x2 || id:

Of course, when we say "easily", we refer to the model specification and not the model formulation. Just like any other modeling task, Bayesian multilevel modeling requires careful consideration.

Continuous, censored, binary, ordinal, count, GLM, and survival outcomes are supported; see the full list of supported multilevel commands. All multilevel features such as multiple levels of hierarchy, nested and crossed random effects, random intercepts and coefficients, and random-effects covariance structures are available. All Bayesian features as provided by the [BAYES] bayesmh command are supported when you use the bayes prefix with multilevel commands; read about general features of the bayes prefix.

See more about Bayesian multilevel modeling.

Let's see it work

Now let's look at several examples.

Simple two-level models

Random intercepts
Custom priors

Random-effects covariance structures

Random coefficients
Model comparison using DIC
Custom priors for matrices

Three-level models

Exponential survival model
Log marginal-likelihood
Lognormal survival model
Model comparison using posterior probabilities

Crossed-effects logistic model

Simple two-level models

Random intercepts

Consider data on math scores of pupils in the third year (math3) and fifth year (math5) from different schools in Inner London (Mortimore et al. 1988). We want to investigate a school effect on math scores.

We can use Stata's mixed command to fit a two-level linear model of math5 on math3 with random intercepts for schools.

. mixed math5 math3 || school:

Performing EM optimization ...

Performing gradient-based optimization:
Iteration 0:   log likelihood = -2767.8923
Iteration 1:   log likelihood = -2767.8923

Computing standard errors ...

Mixed-effects ML regression                     Number of obs     =        887
Group variable: school                          Number of groups  =         48
                                                Obs per group:
                                                              min =          5
                                                              avg =       18.5
                                                              max =         62
                                                Wald chi2(1)      =     347.92
Log likelihood = -2767.8923                     Prob > chi2       =     0.0000



       math5   Coefficient  Std. err.      z    P>|z|     [95% conf. interval]

       math3     .6088066   .0326392    18.65   0.000     .5448349    .6727783
       _cons     30.36495   .3491544    86.97   0.000     29.68062    31.04928





  Random-effects parameters      Estimate   Std. err.     [95% conf. interval]

school: Identity              
                  var(_cons)     4.026853   1.189895      2.256545    7.186004

               var(Residual)     28.12721    1.37289       25.5611    30.95094
LR test vs. linear model: chibar2(01) = 56.38         Prob >= chibar2 = 0.0000

The likelihood-ratio test at the bottom and the estimate of the school variance component suggest statistically significant variability between schools in the math5 scores after adjusting for the math3 scores.

To fit the corresponding Bayesian model, you can simply prefix the above mixed command with bayes:

. bayes: mixed math5 math3 || school:

But here, we will first use the melabel option of bayes to obtain output similar to that of mixed for easier comparison of the results.

. bayes, melabel: mixed math5 math3 || school:

Bayesian multilevel regression                   MCMC iterations  =     12,500
Metropolis–Hastings and Gibbs sampling           Burn-in          =      2,500
                                                 MCMC sample size =     10,000
Group variable: school                           Number of groups =         48
                                                 Obs per group:
                                                              min =          5
                                                              avg =       18.5
                                                              max =         62
                                                 Number of obs    =        887
                                                 Acceptance rate  =      .8091
                                                 Efficiency:  min =     .03366
                                                              avg =      .3331
Log marginal-likelihood                                       max =      .6298



                                                              Equal-tailed    
                    Mean   Std. dev.     MCSE     Median  [95% cred. interval]

math5         
       math3    .6087689   .0326552   .000436   .6087444   .5450837   .6729982
       _cons    30.39202   .3597873    .01961   30.38687   29.67802   31.10252

school        
   var(_cons)    4.272626   1.299061   .039697   4.122282   2.247659   7.220809

var(Residual)    28.23014    1.37812   .017365   28.18347   25.63394   31.04375

Note: Default priors are used for model parameters.

The reported estimates of posterior means and posterior standard deviations for model parameters are similar to the corresponding maximum-likelihood estimates and standard errors reported by mixed.

Unlike mixed, which provided one estimate for each model parameter, bayes: mixed provided, for each parameter, a sample of 10,000 Markov chain Monte Carlo (MCMC) estimates from the simulated posterior distribution of the parameters. The reported posterior means and posterior distributions are the corresponding summaries of the marginal posterior distributions of the parameters.

Let's now see the output from bayes: mixed without the melabel option. The output is lengthy, so we will describe it in pieces.

. bayes


Multilevel structure 

school              
    {U0}: random intercepts 

             
Model summary       

Likelihood:  
  math5 ~ normal(xb_math5,{e.math5:sigma2}) 
             
Priors:      
  {math5:math3 _cons} ~ normal(0,10000)                                    (1) 
                 {U0} ~ normal(0,{U0:sigma2})                              (1) 
     {e.math5:sigma2} ~ igamma(.01,.01)                                       
             
Hyperprior:  
  {U0:sigma2} ~ igamma(.01,.01)                                               
(1) Parameters are elements of the linear form xb_math5.

Bayesian multilevel regression                   MCMC iterations  =     12,500
Metropolis–Hastings and Gibbs sampling           Burn-in          =      2,500
                                                 MCMC sample size =     10,000
Group variable: school                           Number of groups =         48
                                                 Obs per group:
                                                              min =          5
                                                              avg =       18.5
                                                              max =         62
                                                 Number of obs    =        887
                                                 Acceptance rate  =      .8091
                                                 Efficiency:  min =     .03366
                                                              avg =      .3331
Log marginal-likelihood                                       max =      .6298

The header now includes additional information about the fitted Bayesian model. We can see, for example, that parameter {U0} represents random intercepts in the model, that regression coefficients {math5:math3} and {math5:_cons} are assigned default normal priors with zero means and variances of 10,000, and that the variance component for schools, {U0:sigma2}, is assigned the default inverse-gamma prior with 0.01 for both the shape and scale parameters.

In the output table, the results are the same, but the parameter labels are different.



                                                              Equal-tailed    
                    Mean   Std. dev.     MCSE     Median  [95% cred. interval]

math5         
       math3    .6087689   .0326552   .000436   .6087444   .5450837   .6729982
       _cons    30.39202   .3597873    .01961   30.38687   29.67802   31.10252

school        
   U0:sigma2    4.272626   1.299061   .039697   4.122282   2.247659   7.220809

e.math5       
      sigma2    28.23014    1.37812   .017365   28.18347   25.63394   31.04375

Note: Default priors are used for model parameters.

By default, bayes: mixed displays results using parameter names as you would use when referring to these parameters in the options of bayes or during postestimation. For example, you would use {U0:sigma2} to refer to the variance component for schools and {e.math5:sigma2} to refer to the error variance.

There is still one part of the output missing—the estimates of random intercepts {U0}. In a Bayesian multilevel model, random effects are model parameters just like regression coefficients and variance components. bayes does not report them by default because there are often too many of them. But you can display them during or after estimation. Here we replay the estimation, adding the showreffects() option to display the estimates of the first 12 random intercepts.

. bayes, showreffects({U0[1/12]})


Multilevel structure 

school              
    {U0}: random intercepts 

             
Model summary       

Likelihood:  
  math5 ~ normal(xb_math5,{e.math5:sigma2}) 
             
Priors:      
  {math5:math3 _cons} ~ normal(0,10000)                                    (1) 
                 {U0} ~ normal(0,{U0:sigma2})                              (1) 
     {e.math5:sigma2} ~ igamma(.01,.01)                                       
             
Hyperprior:  
  {U0:sigma2} ~ igamma(.01,.01)                                               
(1) Parameters are elements of the linear form xb_math5.

Bayesian multilevel regression                   MCMC iterations  =     12,500
Metropolis–Hastings and Gibbs sampling           Burn-in          =      2,500
                                                 MCMC sample size =     10,000
Group variable: school                           Number of groups =         48
                                                 Obs per group:
                                                              min =          5
                                                              avg =       18.5
                                                              max =         62
                                                 Number of obs    =        887
                                                 Acceptance rate  =      .8091
                                                 Efficiency:  min =     .03366
                                                              avg =      .1749
Log marginal-likelihood                                       max =      .6298



                                                  Equal-tailed    
        Mean   Std. dev.     MCSE     Median  [95% cred. interval]

math5         
       math3    .6087689   .0326552   .000436   .6087444   .5450837   .6729982
       _cons    30.39202   .3597873    .01961   30.38687   29.67802   31.10252

U0[school]    
           1   -2.685824   .9776969   .031227  -2.672364  -4.633162  -.7837494
           2     .015465   1.290535    .03201   .0041493  -2.560203   2.556316
           3    1.049006   1.401383   .033731   1.021202  -1.534088    3.84523
           4   -2.123055   .9921679   .028859  -2.144939  -4.069283  -.1507593
           5   -.1504003   .9650027   .033881  -.1468966  -2.093015   1.721503
           6    .5833945   1.192379   .032408   .5918357  -1.660335   3.049718
           7    1.490231   1.332917   .033846   1.481793  -1.095757   4.272903
           8    .4198105   .9783772   .031891   .4579817  -1.496317   2.403908
           9   -1.996105    1.02632   .035372  -2.001467  -4.037044  -.0296276
          10    .6736806   1.249238   .031114    .660939   -1.70319   3.179273
          11   -.5650109   .9926453   .031783  -.5839293  -2.646413   1.300388
          12   -.3620733   1.090265   .033474  -.3203626  -2.550097   1.717532

school        
   U0:sigma2    4.272626   1.299061   .039697   4.122282   2.247659   7.220809

e.math5       
      sigma2    28.23014    1.37812   .017365   28.18347   25.63394   31.04375

Note: Default priors are used for model parameters.

We could have used showreffect to display all 48.

To compare schools, we can plot posterior distributions (our prior normal distributions updated based on the observed data) of random intercepts. We plot histograms for the same first 12 random intercepts.

. bayesgraph histogram {U0[1/12]}, byparm

We save the MCMC results and store the estimation results from our Bayesian random-intercept model for later model comparison.

. bayes, saving(ri_mcmc)

. estimates store ri

Custom priors

Our initial example used the default priors for model parameters. Just to show you how easy it is to specify custom priors, we specify a uniform on (-50,50) prior for the regression coefficients:

. bayes, prior({math5:}, uniform(-50,50)): mixed math5 math3 || school:
note: Gibbs sampling is used for variance components.

Burn-in 2500 aaaaaaaaa1000aaaaaaaaa2000aaaaa done
Simulation 10000 .........1000.........2000.........3000.........4000.........5
> 000.........6000.........7000.........8000.........9000.........10000 done


Multilevel structure 

school              
    {U0}: random intercepts                

             
Model summary 

Likelihood:  
  math5 ~ normal(xb_math5,{e.math5:sigma2})                                   
             
Priors:      
  {math5:math3 _cons} ~ uniform(-50,50)                                    (1) 
                 {U0} ~ normal(0,{U0:sigma2})                              (1) 
     {e.math5:sigma2} ~ igamma(.01,.01)                                       
             
Hyperprior:  
  {U0:sigma2} ~ igamma(.01,.01)                                               
(1) Parameters are elements of the linear form xb_math5.

Bayesian multilevel regression                   MCMC iterations  =     12,500
Metropolis–Hastings and Gibbs sampling           Burn-in          =      2,500
                                                 MCMC sample size =     10,000
Group variable: school                           Number of groups =         48
                                                 Obs per group:
                                                              min =          5
                                                              avg =       18.5
                                                              max =         62
                                                 Number of obs    =        887
                                                 Acceptance rate  =      .5931
                                                 Efficiency:  min =     .02217
                                                              avg =      .2153
Log marginal-likelihood                                       max =      .6877



                                                              Equal-tailed    
                    Mean   Std. dev.     MCSE     Median  [95% cred. interval]

math5         
       math3    .6094181   .0319517   .001432   .6085484   .5460873   .6732493
       _cons    30.36818   .3290651   .022103   30.38259   29.73806    31.0131

school        
   U0:sigma2    4.261459   1.282453   .040219   4.084322   2.238583   7.218895

e.math5       
      sigma2    28.24094   1.374732   .016577   28.20275   25.68069   31.01401

Note: Default priors are used for some model parameters.

{math5:} in the above is a shortcut for referring to all regression coefficients associated with the dependent variable math5: {math5:math3} and {math5:_cons}.

Similarly, we can use different priors for each regression coefficient, but we omit the output for brevity.

. bayes, prior({math5:_cons}, uniform(-50,50)) prior({math5:math3}, uniform(-5,5)):
     mixed math5 math3 || school:

Random-effects covariance structures

Random coefficients

Let's extend our simple random-intercept model to include a random coefficient.

Following the specification of mixed, we include math3 in the random-effects equation for the school level. We also store our Bayesian estimation results for later comparison. This time, we save MCMC results during estimation.

. bayes, saving(rc_mcmc): mixed math5 math3 || school: math3

. estimates store rc

Here is abbreviated output from bayes: mixed, including a random coefficient for math3.


Multilevel structure 

school              
    {U0}: random intercepts                     
    {U1}: random coefficients for math3         

             
Model summary 

Likelihood: 
  math5 ~ normal(xb_math5,{e.math5:sigma2})     
             
Priors:     
  {math5:math3 _cons} ~ normal(0,10000)                                    (1) 
                 {U0} ~ normal(0,{U0:sigma2})                              (1) 
                 {U1} ~ normal(0,{U1:sigma2})                              (1) 
     {e.math5:sigma2} ~ igamma(.01,.01)                                       
             
Hyperprior:  
  {U0:sigma2} ~ igamma(.01,.01)                                               
  {U1:sigma2} ~ igamma(.01,.01)                                               
(1) Parameters are elements of the linear form xb_math5.

some output omitted



                                                              Equal-tailed    
                    Mean   Std. dev.     MCSE     Median  [95% cred. interval]

math5         
       math3    .6143538   .0454835   .001655   .6137192   .5257402   .7036098
       _cons    30.38813   .3577296   .019669    30.3826   29.71581   31.10304

school        
   U0:sigma2    4.551927   1.368582   .041578   4.361247   2.420075   7.722063
   U1:sigma2    .0398006   .0194373   .001271   .0363514   .0131232   .0881936

e.math5       
      sigma2    27.19758   1.354024   .021967   27.15869   24.71813   30.05862

Note: Default priors are used for model parameters.

mixed assumes independence between random intercepts and coefficients. bayes: mixed does too, to be consistent. We can relax this assumption by specifying an unstructured variance–covariance as follows. We save the MCMC results and store the estimation results from this model as well.

. bayes, saving(rcun_mcmc): mixed math5 math3 || school: math3,
covariance(unstructured)

. estimates store rcun

Here is abbreviated output from bayes: mixed specifying the unstructured covariance.


Multilevel structure 

school              
    {U0}: random intercepts                     
    {U1}: random coefficients for math3         

             
Model summary 

Likelihood: 
  math5 ~ normal(xb_math5,{e.math5:sigma2})     
             
Priors:     
  {math5:math3 _cons} ~ normal(0,10000)                                    (1) 
              {U0 U1} ~ mvnormal(2,{U:Sigma,m})                            (1) 
     {e.math5:sigma2} ~ igamma(.01,.01)                                       
             
Hyperprior:  
  {U:Sigma,m} ~ iwishart(2,3,I(2))                                            
(1) Parameters are elements of the linear form xb_math5.

some output omitted



                                                              Equal-tailed    
                    Mean   Std. dev.     MCSE     Median  [95% cred. interval]

math5         
       math3    .6234197   .0570746   .002699   .6228624   .5144913   .7365849
       _cons    30.34691   .3658515   .021356   30.34399   29.62991   31.07312

school        
 U:Sigma_1_1    4.527905   1.363492   .046275   4.345457   2.391319   7.765521
 U:Sigma_2_1    -.322247   .1510543   .004913  -.3055407  -.6683891  -.0679181
 U:Sigma_2_2    .0983104   .0280508   .000728   .0941222   .0556011   .1649121

e.math5       
      sigma2     26.8091    1.34032   .018382   26.76549   24.27881   29.53601

Note: Default priors are used for model parameters.

Model comparison using DIC

We now use DIC to compare the three models: random-intercept model, random-coefficient model with independent covariance structure, and random-coefficient model with unstructured covariance structure.

. bayesstats ic ri rc rcun, diconly

Deviance information criterion



                     DIC

          ri    5514.514
          rc    5500.837
        rcun    5494.349

DIC is the smallest for the random-coefficient model with an unstructured random-effects covariance, so this model is preferable.

Custom priors for matrices

To demonstrate that you can specify custom priors for not only scalar parameters but also matrices, we specify a custom inverse-Wishart prior for the random-effects covariance matrix {Sigma,m}, which is short for {Sigma,matrix}.

. matrix S = (5,-0.5\-0.5,1)

. bayes, iwishartprior(10 S): mixed math5 math3 || school: math3,
     covariance(unstructured)
note: Gibbs sampling is used for regression coefficients and variance
      components.

Burn-in 2500 aaaaaaaaa1000aaaaaaaaa2000aaaaa done
Simulation 10000 .........1000.........2000.........3000.........4000.........5
> 000.........6000.........7000.........8000.........9000.........10000 done


Multilevel structure 

school              
    {U0}: random intercepts                     
    {U1}: random coefficients for math3         

             
Model summary 

Likelihood: 
  math5 ~ normal(xb_math5,{e.math5:sigma2})     
             
Priors:     
  {math5:math3 _cons} ~ normal(0,10000)                                    (1) 
              {U0 U1} ~ mvnormal(2,{U:Sigma,m})                            (1) 
     {e.math5:sigma2} ~ igamma(.01,.01)                                       
             
Hyperprior:  
  {U:Sigma,m} ~ iwishart(2,10,S)                                              
(1) Parameters are elements of the linear form xb_math5.

some output omitted



                                                              Equal-tailed    
                    Mean   Std. dev.     MCSE     Median  [95% cred. interval]

math5         
       math3    .6130199   .0537473    .00282    .613916   .5058735   .7180286
       _cons     30.3789   .3223274   .016546    30.3816   29.74903   31.02091

school        
 U:Sigma_1_1    3.482914   1.104742   .048864   3.344148   1.770735     6.0136
 U:Sigma_2_1   -.2712029   .1169666   .004214  -.2596221  -.5337747  -.0745626
 U:Sigma_2_2    .0775669   .0210763   .000651    .074876   .0443026   .1264642

e.math5       
      sigma2    26.94206   1.342571   .022106   26.90405    24.4033   29.66083

Note: Default priors are used for model parameters.

The iwishartprior() option overrides the parameters of the default inverse-Wishart prior distributions used by the bayes prefix. In our example, this prior is used for the covariance matrix with the default arguments of 3 degrees of freedom and an identity scale matrix. In the above example, we instead used 10 degrees of freedom and the scale matrix S.

Three-level models

Exponential survival model

Consider survival data that record durations (in months) of employment of individuals from tstart to tend. Variable failure contains 1 when the end date corresponds to the end of employment, and 0 otherwise. A two-level model would account for the variability between individuals, who are identified by the id variable. To demonstrate a three-level model, let's also account for the variability between birth years of individuals, which may help explain some of the between-individual variability.

Here is how we would proceed with the standard multilevel analysis for these data.

. stset tend, origin(tstart) failure(failure)

. mestreg education njobs prestige i.female || birthyear: || id:, distribution(exponential)

We model the time to the end of employment as a function of the education level, the number of jobs held previously, the prestige of the current job, and gender.

In our Bayesian analysis, we will compare how well the two survival models, exponential and lognormal, fit these data.

First, we fit a Bayesian three-level exponential model by simply prefixing the above mestreg command with bayes:.

. bayes: mestreg education njobs prestige i.female || birthyear: || id:,
     distribution(exponential)

The output is lengthy, so we describe it in parts.

The multilevel summary provides the names of parameters, {U0} and {UU0}, for random intercepts at the third and second levels of hierarchy, respectively. (Observations compose the first level.)


Multilevel structure 

birthyear           
    {U0}: random intercepts                     
             
birthyear>id        
    {UU0}: random intercepts

The model summary describes the likelihood model and prior distributions used. For example, both variance components, {U0:sigma2} and {UU0:sigma2}, are assigned inverse-gamma prior distributions with scale and shape parameters of 0.01.


Model summary       

Likelihood:         
  _t ~ mestreg(xb__t)                           
             
Priors:                
  {_t:education njobs prestige 1.female _cons} ~ normal(0,10000)           (1) 
                                          {U0} ~ normal(0,{U0:sigma2})     (1) 
                                         {UU0} ~ normal(0,{UU0:sigma2})    (1) 
             
Hyperpriors:           
   {U0:sigma2} ~ igamma(.01,.01)                                              
  {UU0:sigma2} ~ igamma(.01,.01)                                              

(1) Parameters are elements of the linear form xb__t.

The header information now includes a group summary for each hierarchical level.

Bayesian multilevel exponential PH regression    MCMC iterations  =     12,500
Random-walk Metropolis–Hastings sampling         Burn-in          =      2,500
                                                 MCMC sample size =     10,000



                    No. of       Observations per group
 Group variable       groups    Minimum    Average    Maximum

     birthyear           12          3       50.0         99
            id          201          1        3.0          9



                                                 Number of obs    =        600
                                                 Acceptance rate  =      .3045
                                                 Efficiency:  min =    .008093
                                                              avg =     .01554
Log marginal-likelihood                                       max =     .02482

Just like mestreg, bayes: mestreg by default reports hazard ratios for the exponential survival model. You can use the nohr option to instead obtain coefficients. You can specify this option with bayes or mestreg during estimation or with bayes on replay.

. bayes, nohr



                                                              Equal-tailed
                    Mean   Std. dev.     MCSE     Median  [95% cred. interval]

-t            
   education    .1008933   .0338232   .003186   .0995609   .0383122   .1732759
       njobs    -.248267   .0477605   .003938  -.2493944  -.3383752  -.1606085
    prestige   -.0296164    .006582   .000466  -.0294993  -.0428695  -.0171793
    1.female    .4593769   .1527497   .025605   .4589502   .1445377   .7329368
       _cons   -4.381721   .3651954   .045888  -4.356305  -5.129819  -3.735914

birthyear     
   U0:sigma2    .1177088   .1230839   .014551   .0834062   .0147882   .4957971

birthyear>id  
  UU0:sigma2    .5227624     .12143   .009875   .5153562   .3075033    .784219

We save the MCMC results and store the estimation results from our Bayesian exponential survival model for later model comparison. We also specify the remargl option to compute the log marginal-likelihood, which we explain below.

. bayes, remargl saving(streg_exp_mcmc)

. estimates store streg_exp

Log marginal-likelihood

Previously, we used bayes: mestreg to fit a Bayesian exponential survival model to employment duration data. If you look closely at the header output from bayes: mestreg,

Bayesian multilevel exponential PH regression    MCMC iterations  =     12,500
Random-walk Metropolis–Hastings sampling         Burn-in          =      2,500
                                                 MCMC sample size =     10,000
                                                 Number of obs    =        600
                                                 Acceptance rate  =      .3045
                                                 Efficiency:  min =    .008093
                                                              avg =     .01554
Log marginal-likelihood                                       max =     .02482

you will notice that no value is reported for the log marginal-likelihood (LML). This is intentional. As we mentioned earlier, Bayesian multilevel models treat random effects as parameters and thus may contain many model parameters. For models with many parameters or high-dimensional models, the computation of LML can be time consuming, and its accuracy may become unacceptably low. For these reasons, the bayes prefix does not compute the LML for multilevel models by default. You can specify the remargl option during estimation or on replay to compute it.

Here is a subset of the relevant output after typing

. bayes, remargl

Bayesian multilevel exponential PH regression    MCMC iterations  =     12,500
Random-walk Metropolis–Hastings sampling         Burn-in          =      2,500
                                                 MCMC sample size =     10,000
                                                 Number of obs    =        600
                                                 Acceptance rate  =      .3045
                                                 Efficiency:  min =    .008093
                                                              avg =     .01554
Log marginal-likelihood = -2530.4131                          max =     .02482

Notice that the LML value is now reported in the header output.

Why do we need the value of LML? Some of the Bayesian summaries used for model comparison such as Bayes factors and model posterior probabilities are based on LML. In this example, we want to demonstrate the use of model posterior probabilities to compare Bayesian models, and so we needed to compute LML.

Lognormal survival model

We now fit a Bayesian three-level lognormal model by specifying the corresponding distribution in the distribution() option. We also specify the remargl and saving() options during estimation this time.

. bayes, remargl saving(streg_lnorm_mcmc):
     mestreg education njobs prestige i.female || birthyear: || id:,
     distribution(lognormal)

Bayesian multilevel lognormal AFT regression     MCMC iterations  =     12,500
Random-walk Metropolis–Hastings sampling         Burn-in          =      2,500
                                                 MCMC sample size =     10,000
                                                 Number of obs    =        600
                                                 Acceptance rate  =      .3714
                                                 Efficiency:  min =    .004755
                                                              avg =     .01522
Log marginal-likelihood = -2496.4991                          max =     .04602



                                                  Equal-tailed    
        Mean   Std. dev.     MCSE     Median  [95% cred. interval]

-t            
   education   -.0951555   .0314752   .002934  -.0947659  -.1563553  -.0348788
       njobs    .1733276   .0423128   .004265   .1745013   .0871177   .2545806
    prestige    .0272722   .0056387   .000263   .0272781   .0160048   .0384117
    1.female   -.3406851   .1323801   .018735  -.3383823  -.5965842  -.0912631
       _cons    3.822971   .4130557   .052204   3.817862   3.027644   4.635152

     lnsigma    .0723268   .0419609   .002715   .0722478  -.0100963   .1569296

birthyear     
   U0:sigma2    .1669752   .1566106   .013008    .120028   .0252243   .6019641

birthyear>id  
  UU0:sigma2    .3061048   .1067159   .015475   .2919266   .1357847   .5485671

Note: Default priors are used for model parameters.
Note: Adaptation tolerance is not met in at least one of the blocks.

. estimates store streg_lnorm

Model comparison using posterior probabilities

We now compare models using model posterior probabilities.

. bayestest model streg_exp streg_lnorm

Bayesian model tests



                  log(ML)       P(M)     P(M|y)

    streg_exp   -2.53e+03     0.5000     0.0000
  streg_lnorm   -2.50e+03     0.5000     1.0000

 Note: Marginal likelihood (ML) is computed using
       Laplace-Metropolis approximation.

The posterior probability for the lognormal model is essentially 1, so it is preferable to the exponential model for these data.

Crossed-effects logistic model

In example 5 of [ME] melogit, we fit a crossed-effects model to the data from a study measuring students' attainment scores in primary and secondary schools from Fife, Scotland. The binary outcome of interest is whether the score is greater than 6. The model includes sex as the covariate and the effects of primary and secondary schools as crossed random effects.

. melogit attain_gt_6 sex || _all: R.sid || pid:

We fit the corresponding Bayesian crossed-effects model by simply prefixing the above melogit command with bayes:.

. bayes: melogit attain_gt_6 sex || _all: R.sid || pid:

The output is lengthy, so as before, we describe it in parts.


Multilevel structure 

sid                 
    {U0}: random intercepts 
pid                 
    {V0}: random intercepts

We have two sets of random intercepts, {U0} and {V0}, at the secondary sid and primary pid levels, respectively.


Model summary       

Likelihood:         
  attain_gt_6 ~ melogit(xb_attain_gt_6) 
                    
Priors:             
  {attain_gt_6:sex _cons} ~ normal(0,10000)                                (1) 
                     {U0} ~ normal(0,{U0:sigma2})                          (1) 
                     {V0} ~ normal(0,{V0:sigma2})                          (1) 
                    
Hyperpriors:        
  {U0:sigma2} ~ igamma(.01,.01)                                               
  {V0:sigma2} ~ igamma(.01,.01)                                               
(1) Parameters are elements of the linear form xb_attain_gt_6.

The likelihood model is a multilevel logistic model. The prior distributions are normal for regression coefficients and random intercepts and are inverse-gamma for the variance components.



                   No. of       Observations per group
 Group variable       groups    Minimum    Average    Maximum

           _all            1      3,435    3,435.0      3,435
            pid          148          1       23.2         72


Family: Bernoulli                                Number of obs    =      3,435
Link  : logit                                    Acceptance rate  =      .3086
                                                 Efficiency:  min =     .01048
                                                              avg =      .0245
Log marginal-likelihood                                       max =     .03698

The header information includes the MCMC simulation summary as well as other details about the fitted Bayesian model.



                                                              Equal-tailed    
                    Mean   Std. dev.     MCSE     Median  [95% cred. interval]

attain_gt_6   
         sex    .2898934   .0731912   .004749   .2879629   .1568701   .4429235
       _cons   -.6502969    .109456   .010693  -.6497501  -.8509903  -.4300285

sid           
   U0:sigma2    .1484609   .0826556   .005049   .1336719   .0241995   .3532284

pid           
   V0:sigma2    .4777177   .0954019   .004961      .4701   .3228973   .6864048

Note: Default priors are used for model parameters.

The results suggest that both primary and secondary schools contribute to the variation in whether the attainment score is greater than 6 after adjusting for sex in the model.

Instead of the estimates of coefficients, we can obtain the estimates of odds ratios.

. bayes, or



                                                              Equal-tailed    
              Odds ratio   Std. dev.     MCSE     Median  [95% cred. interval]

attain_gt_6   
         sex    1.339886   .0988964    .00647   1.333708   1.169844   1.557253
       _cons    .5250545   .0584524   .005902   .5221763   .4269919   .6504906

sid           
   U0:sigma2    .1484609   .0826556   .005049   .1336719   .0241995   .3532284

pid           
   V0:sigma2    .4777177   .0954019   .004961      .4701   .3228973   .6864048
Note: Estimates are transformed only in the first equation to odds ratios.
Note: _cons estimates baseline odds (conditional on zero random effects).
Note: Default priors are used for model parameters.

We could have also specified the or option during estimation either with the bayes prefix,

. bayes, or: melogit, attain_gt_6 sex || _all: R.sid || pid:

or with melogit,

. bayes: melogit attain_gt_6 sex || _all: R.sid || pid:, or

Reference

Mortimore, P., P. Sammons, L. Stoll, D. Lewis, and R. Ecob. 1988. School Matters: The Junior Years. Wells, Somerset, UK: Open Books.

Tell me more

Learn more about Bayesian multilevel modeling.

See Bayesian longitudinal/panel-data models.

See Bayesian random-effects quantile models.

Learn more about the general features of the bayes prefix.

Learn more about Stata's Bayesian analysis features.

Read more about the bayes prefix and Bayesian analysis in the Stata Bayesian Analysis Reference Manual.

Products

New in Stata 19

Why Stata

All features

Disciplines

Stata/MP

StataNow

Order Stata

Purchase

Order Stata

Bookstore

Stata Press

Stata Journal

Gift Shop

Learn

Free webinars

NetCourses

Classroom and web training

Organizational training

Video tutorials

Third-party courses

Web resources

Teaching with Stata

Support

Training

Video tutorials

FAQs

Statalist: The Stata Forum

Resources

Technical support

Customer service

Alerts

Company

News and events

Customer service

Careers

We use cookies

We use cookies to ensure that we give you the best experience on our website—to enhance site navigation, to analyze usage, and to assist in our marketing efforts. By continuing to use our site, you consent to the storing of cookies on your device and agree to delivery of content, including web fonts and JavaScript, from third party web services.

Cookie Settings

Privacy policy

Last updated: 16 November 2022

StataCorp LLC (StataCorp) strives to provide our users with exceptional products and services. To do so, we must collect personal information from you. This information is necessary to conduct business with our existing and potential customers. We collect and use this information only where we may legally do so. This policy explains what personal information we collect, how we use it, and what rights you have to that information.

Required cookies

Advertising cookies

Required cookies

These cookies are essential for our website to function and do not store any personally identifiable information. These cookies cannot be disabled.
Advertising and performance cookies

This website uses cookies to provide you with a better user experience. A cookie is a small piece of data our website stores on a site visitor's hard drive and accesses each time you visit so we can improve your access to our site, better understand how you use our site, and serve you content that may be of interest to you. For instance, we store a cookie when you log in to our shopping cart so that we can maintain your shopping cart should you not complete checkout. These cookies do not directly store your personal information, but they do support the ability to uniquely identify your internet browser and device.

Please note: Clearing your browser cookies at any time will undo preferences saved here. The option selected here will apply only to the device you are currently using.

Accept Cookies


math5		Coefficient Std. err. z P>\|z\| [95% conf. interval]

math3		.6088066 .0326392 18.65 0.000 .5448349 .6727783
_cons		30.36495 .3491544 86.97 0.000 29.68062 31.04928


Random-effects parameters		Estimate Std. err. [95% conf. interval]

school: Identity
var(_cons)		4.026853 1.189895 2.256545 7.186004

var(Residual)		28.12721 1.37289 25.5611 30.95094


		Equal-tailed
		Mean Std. dev. MCSE Median [95% cred. interval]

math5
math3		.6087689 .0326552 .000436 .6087444 .5450837 .6729982
_cons		30.39202 .3597873 .01961 30.38687 29.67802 31.10252

school
var(_cons)		4.272626 1.299061 .039697 4.122282 2.247659 7.220809

var(Residual)		28.23014 1.37812 .017365 28.18347 25.63394 31.04375

Multilevel structure

school
{U0}: random intercepts


Model summary

Likelihood:
math5 ~ normal(xb_math5,{e.math5:sigma2})

Priors:
{math5:math3 _cons} ~ normal(0,10000) (1)
{U0} ~ normal(0,{U0:sigma2}) (1)
{e.math5:sigma2} ~ igamma(.01,.01)

Hyperprior:
{U0:sigma2} ~ igamma(.01,.01)

Multilevel structure

birthyear
{U0}: random intercepts

birthyear>id
{UU0}: random intercepts

Model summary

Likelihood:
_t ~ mestreg(xb__t)

Priors:
{_t:education njobs prestige 1.female _cons} ~ normal(0,10000) (1)
{U0} ~ normal(0,{U0:sigma2}) (1)
{UU0} ~ normal(0,{UU0:sigma2}) (1)

Hyperpriors:
{U0:sigma2} ~ igamma(.01,.01)
{UU0:sigma2} ~ igamma(.01,.01)


		No. of Observations per group
Group variable		groups Minimum Average Maximum

birthyear		12 3 50.0 99
id		201 1 3.0 9


		Equal-tailed
		Mean Std. dev. MCSE Median [95% cred. interval]

-t
education		.1008933 .0338232 .003186 .0995609 .0383122 .1732759
njobs		-.248267 .0477605 .003938 -.2493944 -.3383752 -.1606085
prestige		-.0296164 .006582 .000466 -.0294993 -.0428695 -.0171793
1.female		.4593769 .1527497 .025605 .4589502 .1445377 .7329368
_cons		-4.381721 .3651954 .045888 -4.356305 -5.129819 -3.735914

birthyear
U0:sigma2		.1177088 .1230839 .014551 .0834062 .0147882 .4957971

birthyear>id
UU0:sigma2		.5227624 .12143 .009875 .5153562 .3075033 .784219


		log(ML) P(M) P(M\|y)

streg_exp		-2.53e+03 0.5000 0.0000
streg_lnorm		-2.50e+03 0.5000 1.0000

Multilevel structure

sid
{U0}: random intercepts
pid
{V0}: random intercepts

Model summary

Likelihood:
attain_gt_6 ~ melogit(xb_attain_gt_6)

Priors:
{attain_gt_6:sex _cons} ~ normal(0,10000) (1)
{U0} ~ normal(0,{U0:sigma2}) (1)
{V0} ~ normal(0,{V0:sigma2}) (1)

Hyperpriors:
{U0:sigma2} ~ igamma(.01,.01)
{V0:sigma2} ~ igamma(.01,.01)


		Equal-tailed
		Mean Std. dev. MCSE Median [95% cred. interval]

attain_gt_6
sex		.2898934 .0731912 .004749 .2879629 .1568701 .4429235
_cons		-.6502969 .109456 .010693 -.6497501 -.8509903 -.4300285

sid
U0:sigma2		.1484609 .0826556 .005049 .1336719 .0241995 .3532284

pid
V0:sigma2		.4777177 .0954019 .004961 .4701 .3228973 .6864048


		Equal-tailed
		Odds ratio Std. dev. MCSE Median [95% cred. interval]

attain_gt_6
sex		1.339886 .0988964 .00647 1.333708 1.169844 1.557253
_cons		.5250545 .0584524 .005902 .5221763 .4269919 .6504906

sid
U0:sigma2		.1484609 .0826556 .005049 .1336719 .0241995 .3532284

pid
V0:sigma2		.4777177 .0954019 .004961 .4701 .3228973 .6864048