Home  /  Products  /  Features  /  Bayesian analysis: Multiple Markov chains

<-  See Stata's other features

Highlights

  • nchains() option for simulating multiple chains with bayes: and bayesmh

  • Use default or specify your own initial values for each chain

  • Posterior summaries, graphical diagnostics, and more using multiple chains

  • Gelman–Rubin convergence diagnostics

Markov chain Monte Carlo (MCMC) is used for Bayesian inference. Has the MCMC converged? Has it fully explored the target posterior distribution? Or do you need more simulations? You need to answer these questions for proper inference. Stata's nchains() can help you with that.

Type

. bayes, nchains(4): regress y x1 x2

and four chains will be produced, in this case for Bayesian regression of y on x1 and x2. The chains will be compared and the maximum Gelman–Rubin convergence diagnostic reported. And the chains will be combined to produce a more accurate final result. The intent is that you check the reported diagnostic before interpreting the results.

nchains() may be used with the bayes: or bayesmh command.

For convergence diagnostic purposes, it is important that different chains have dispersed initial values for the model parameters. Default initial values are provided, but they may not be sufficient for all cases. You can use initall() and init#() to specify your own initial values for all chains or for specific chains. See Convergence diagnostics using multiple chains for details.

All existing Bayesian postestimation commands such as bayesstats, bayestest, and bayesgraph provide support for multiple chains via the chains() and sepchains options.

Also see the bayesstats grubin command, which computes Gelman–Rubin convergence diagnostics for all model parameters, and How to run multiple chains in parallel?

Let's see it work

Consider the diabetes data (Efron et al. 2004) on 442 diabetes patients that record a measure of disease progression as an outcome y. To demonstrate multiple chains, we consider a small subset of variables of interest: age, sex, body mass index, and mean arterial pressure.

Multiple chains serve multiple purposes. More commonly, they are used to explore MCMC convergence. But they are also useful for obtaining more precise estimates of parameters. And you can run them in parallel to save time.

Using multiple chains to explore convergence visually

Let's fit a Bayesian linear regression and produce three multiple chains.

. bayes, nchains(3) rseed(16): regress y age sex bmi map

Chain 1
  Burn-in ...
  Simulation ...

Chain 2
  Burn-in ...
  Simulation ...

Chain 3
  Burn-in ...
  Simulation ...

Model summary
Likelihood: y ~ regress(xb_y,{sigma2}) Priors: {y:age sex bmi map _cons} ~ normal(0,10000) (1) {sigma2} ~ igamma(.01,.01)
(1) Parameters are elements of the linear form xb_y. Bayesian linear regression Number of chains = 3 Random-walk Metropolis-Hastings sampling Per MCMC chain: Iterations = 12,500 Burn-in = 2,500 Sample size = 10,000 Number of obs = 442 Avg acceptance rate = .3274 Avg efficiency: min = .04375 avg = .07228 max = .1859 Avg log marginal-likelihood = -2478.936 Max Gelman-Rubin Rc = 1.002
Equal-tailed
Mean Std. dev. MCSE Median [95% cred. interval]
y
age 64.63216 53.6199 1.48006 64.57528 -40.40117 170.1121
sex -56.67161 52.83794 1.25481 -55.63422 -158.4018 45.66125
bmi 591.5728 55.3663 1.49183 592.1313 483.4726 699.7989
map 345.2107 56.15044 1.45502 344.3014 233.006 455.7745
_cons 152.063 2.876962 .074766 152.1028 146.3381 157.6296
sigma2 3722.503 259.0622 3.46912 3709.148 3245.06 4254.987
Note: Default priors are used for model parameters. Note: Default initial values are used for multiple chains.

We demonstrate the use of multiple chains to explore MCMC convergence in detail in Gelman–Rubin convergence diagnostic. Here we will just point out that the maximum Gelman–Rubin diagnostic, reported as Max Gelman-Rubin Rc in the header, is 1.002, which is less than 1.1 and thus does not indicate convergence problems for our model.

We can also explore convergence visually by looking at a trace plot, for instance, for the bmi coefficient.

. bayesgraph trace {y:bmi}

With multiple chains, bayesgraph automatically plots multiple chains on one graph. The trace plot does not indicate any convergence problems. All chains look similar and oscillate around similar point estimates.

We can explore more diagnostics.

. bayesgraph diagnostics {y:bmi}

The autocorrelation and distribution plots look similar for all chains. The autocorrelation dies off fairly quickly in all three chains.

In this demonstration, we are satisfied with our visual exploration of MCMC convergence. In actual analysis, we would want to perform the above steps for all model parameters and explore the convergence more formally.

Posterior summaries using multiple chains

After verifying MCMC convergence, as in Using multiple chains to explore convergence visually and Using multiple chains to explore convergence formally, we can perform inference.

In the presence of multiple chains, posterior summaries are computed by using all chains, three in our example:

. bayesstats summary

Posterior summary statistics                      Number of chains =         3
                                                  MCMC sample size =    30,000

Equal-tailed
Mean Std. dev. MCSE Median [95% cred. interval]
y
age 64.63216 53.6199 1.48006 64.57528 -40.40117 170.1121
sex -56.67161 52.83794 1.25481 -55.63422 -158.4018 45.66125
bmi 591.5728 55.3663 1.49183 592.1313 483.4726 699.7989
map 345.2107 56.15044 1.45502 344.3014 233.006 455.7745
_cons 152.063 2.876962 .074766 152.1028 146.3381 157.6296
sigma2 3722.503 259.0622 3.46912 3709.148 3245.06 4254.987

Using all chains provides more precise estimates of model parameters. For instance, we can compare the results above with the results from only the first chain:

. bayesstats summary, chains(1)

Posterior summary statistics                      Number of chains =         1
Chain 1                                           MCMC sample size =    10,000

Equal-tailed
Mean Std. dev. MCSE Median [95% cred. interval]
y
age 65.38379 54.08876 2.18726 65.78475 -42.80366 169.0617
sex -57.99493 54.04443 2.46374 -57.67259 -158.3685 47.19167
bmi 590.8162 55.57828 2.90659 590.5812 483.4726 700.9136
map 346.9589 56.02036 2.80642 344.5968 238.6808 460.3592
_cons 152.1605 2.880091 .136456 152.2867 146.5691 157.6759
sigma2 3726.441 262.7137 6.34486 3720.178 3249.068 4262.679

The posterior mean and standard deviations are similar, but the MCSEs are lower for the results using all three chains because there are more MCMC estimates and they are less correlated.

We can even compare the results from all chains.

. bayesstats summary, sepchains

Posterior summary statistics

Chain 1                                           MCMC sample size =    10,000

Equal-tailed
Mean Std. dev. MCSE Median [95% cred. interval]
y
age 65.38379 54.08876 2.18726 65.78475 -42.80366 169.0617
sex -57.99493 54.04443 2.46374 -57.67259 -158.3685 47.19167
bmi 590.8162 55.57828 2.90659 590.5812 483.4726 700.9136
map 346.9589 56.02036 2.80642 344.5968 238.6808 460.3592
_cons 152.1605 2.880091 .136456 152.2867 146.5691 157.6759
sigma2 3726.441 262.7137 6.34486 3720.178 3249.068 4262.679
Chain 2 MCMC sample size = 10,000
Equal-tailed
Mean Std. dev. MCSE Median [95% cred. interval]
y
age 61.57452 52.64732 4.16571 60.93262 -41.98606 165.0094
sex -56.0638 53.11474 1.99601 -53.1213 -155.3794 45.02055
bmi 592.0209 55.81173 2.41218 591.3277 482.1642 699.0798
map 345.6178 57.0509 2.537 344.935 235.805 460.1346
_cons 152.1002 2.82684 .13415 152.0675 146.5058 157.6649
sigma2 3720.897 257.873 5.96029 3706.499 3240.245 4253.801
Chain 3 MCMC sample size = 10,000
Equal-tailed
Mean Std. dev. MCSE Median [95% cred. interval]
y
age 66.93817 53.90694 2.31714 67.29965 -35.94106 174.1906
sex -55.95611 51.28798 2.12267 -54.78232 -159.7941 45.54302
bmi 591.8815 54.69921 2.50606 593.381 484.8148 698.1792
map 343.0555 55.26911 2.28489 343.8618 224.2158 449.4169
_cons 151.9282 2.916109 .119936 151.953 146.1003 157.5059
sigma2 3720.172 256.5298 5.75014 3703.189 3244.123 4247.216

Multiple chains may be particularly useful for parameters with low sampling efficiencies.

Just like bayesstats summary, all other postestimation commands recognize the presence of multiple chains. For instance, suppose we would like to test whether the coefficient for bmi is larger than 600. We can use bayestest interval to compute the posterior probability that {y:bmi}>600.

. bayestest interval {y:bmi}, lower(600)

Interval tests     Number of chains =         3
                   MCMC sample size =    30,000

       prob1 : {y:bmi} > 600

Mean Std. dev. MCSE
prob1 .4436 0.49684 .0124317

The estimated posterior probability is about 44%.

Reference

Efron, B., T. Hastie, I. Johnstone, and R. Tibshirani. 2004. Least angle regression. The Annals of Statistics 32: 407–499.

Tell me more

See Gelman–Rubin convergence diagnostic.

Learn about Bayesian lasso modeling.

Read more about Multiple chains and Bayesian analysis in the Stata Bayesian Analysis Reference Manual.