 »  Home »  Products »  Stata 16 »  Bayesian analysis: Gelman-Rubin convergence diagnostic

# Bayesian analysis: Gelman-Rubin convergence diagnostic

## Highlights

• Compute using multiple chains
• Maximum diagnostic automatically reported by bayes: and bayesmh
• Compute diagnostics for each parameter
• Compute diagnostics for functions of parameters
See all features

Establishing convergence of Markov chain Monte Carlo (MCMC) is one of the most important steps of Bayesian analysis. Multiple chains are often used to check MCMC convergence. The Gelman–Rubin convergence diagnostic provides a numerical convergence summary based on multiple chains. Stata's new command bayesstats grubin computes this diagnostic for all model parameters. Also, when you simulate multiple chains using bayes: or bayesmh, these commands automatically report the maximum of the convergence diagnostics across all model parameters.

After using bayes, nchains(): or bayesmh, nchains() to simulate multiple chains, look at the maximum Gelman–Rubin diagnostic reported by the commands in the header. When it suggests nonconvergence, you can look at the individual diagnostics to discover which parameters have convergence problems. Type

. bayesstats grubin


Sometimes nonconvergence arises because you did not run enough simulations. Other times, it arises because of inherent problems with the model's specification. If you run more simulations without convergence and the same parameters keep being identified as problematic, you need to think about respecifying your model.

## Let's see it work

Consider the diabetes data Efron et al. (2004) on 442 diabetes patients that record a measure of disease progression as an outcome y and 10 baseline covariates: age, sex, body mass index, mean arterial pressure, and six blood serum measurements. We consider the version of these data in which the covariates are standardized. This dataset is commonly used to demonstrate various model-building techniques, including Bayesian lasso. Here we use it to demonstrate the bayesstats grubin command.

## Example of nonconvergence

We use bayes: regress to fit the Bayesian linear regression of y on all variables in the dataset and use option nchains(3) to generate three chains.

. bayes, nchains(3) rseed(16): regress y age sex bmi map tc ldl hdl tch ltg glu

Chain 1
Burn-in ...
Simulation ...

Chain 2
Burn-in ...
Simulation ...

Chain 3
Burn-in ...
Simulation ...

Model summary

Likelihood:
y ~ regress(xb_y,{sigma2})

Priors:
{y:age sex bmi map tc ldl hdl tch ltg glu _cons} ~ normal(0,10000)       (1)
{sigma2} ~ igamma(.01,.01)

(1) Parameters are elements of the linear form xb_y.

Bayesian linear regression                    Number of chains    =          3
Random-walk Metropolis-Hastings sampling      Per MCMC chain:
Iterations      =     12,500
Burn-in         =      2,500
Sample size     =     10,000
Number of obs       =        442
Avg acceptance rate =       .321
Avg efficiency: min =    .003771
avg =     .01886
max =      .1142
Avg log marginal-likelihood = -2457.6885      Max Gelman-Rubin Rc =      4.543

Equal-tailed
Mean   Std. Dev.     MCSE     Median  [95% Cred. Interval]

y
age    81.50951   87.57016   4.06945    71.7336  -50.84804   234.4335
sex   -167.0296    71.9136   6.76093   -167.887   -289.098  -42.73281
bmi    352.3836   99.94654   6.13795    360.123    193.378   505.5432
map    286.0075   78.16619   5.23075   275.4511   174.2242   426.8137
tc   -273.1433   164.7453    14.618  -295.8386  -501.8144  -16.60783
ld1     165.232    178.513   8.84052    231.598  -125.8265   352.7025
hd1   -94.36373   114.5865   7.16308  -106.0647  -263.5826   99.65676
tch     109.925   162.0595   13.0143   134.8212  -155.8764   332.5971
ltg     483.243   102.1253   6.00414   495.4546   307.5461   627.0424
glu    42.76467    117.208   6.29098   65.24344  -146.8923    198.615
_cons    152.4957   2.780968   .103842   152.4606   147.3315   157.9954

sigma2     3110.56   221.9696   3.79219   3100.067   2707.435   3566.951

Note: Default priors are used for model parameters.
Note: Default initial values are used for multiple chains.
Note: There is a high autocorrelation after 500 lags in at least one of the
chains.


The maximum Gelman–Rubin diagnostic across all model parameters is labeled as Max Gelman–Rubin Rc in the header and is 4.543. Gelman and Rubin (1992) and Brooks and Gelman (1998) suggest that diagnostic Rc values greater than 1.2 for any of the model parameters should indicate nonconvergence. In practice, a more stringent rule of Rc < 1.1 is often used to declare convergence.

In our example, the maximum Rc of 4.543 is larger than 1.1, so our MCMC did not converge. The presence of the note about high autocorrelation is also concerning, although it may not necessarily indicate nonconvergence.

We can explore Gelman–Rubin diagnostics for all parameters to identify the problematic ones. We use bayesstats grubin for that. We additionally sort the results from largest to smallest to find the most problematic parameters.

. bayesstats grubin, sort

Gelman-Rubin convergence diagnostic

Number of chains     =            3
MCMC size, per chain =       10,000
Max Gelman-Rubin Rc  =     4.542823

Rc

y
ldl    4.542823
tch    3.376646
tc    3.184339
glu    3.089546
bmi    2.543104
hdl    2.447089
map    2.421061
age     2.40928
ltg    2.389624
sex    1.680013
_cons    1.082658

sigma2    1.023543

Convergence rule: Rc < 1.1


The coefficient for ldl has the maximum Rc value. All parameters except the last two exceed 1.1. Our model certainly has convergence issues.

Let's look at graphical diagnostics for the ldl coefficient.

. bayesgraph diagnostics {y:ldl}


There is a clear separation in the trace plots of the chains, and the distributions of all three chains are different.

So what do we do now? The answer is model- and data-specific. Sometimes we may simply need to increase our burn-in period, option burnin(). Other times, especially in the presence of highly correlated model parameters, we may need to improve efficiency of our sampling procedure by blocking parameters or using more efficient samplers; see Specifying MCMC sampling procedure. When the data provide little information about some of the model parameters, we may need to use more informative priors or entirely modify our model. See Convergence of MCMC for other suggestions.

We provide the solution for our model in the next section.

## Example of convergence

In the above Example of nonconvergence, we fit a Bayesian linear regression to the diabetes data and encountered problems with MCMC convergence using our default model specification.

From the bayes: regress output above, there appears to be a large variation in the magnitudes of the estimated coefficients. We may try placing parameters with similar magnitudes in separate blocks to improve simulation efficiency.

Also, many variables in the diabetes dataset are correlated, and some are highly correlated. This induces high correlations between some of the model parameters. (For instance, you can use bayesgraph matrix to explore pairwise correlations between model parameters.) The default adaptive Metropolis–Hastings used by bayes: and bayesmh may suffer from low efficiency in the presence of many highly correlated model parameters. In our example, the default priors used by bayes: regress—normal for coefficients and inverse gamma for the variance—allow us to use a more efficient Gibbs sampling.

. bayes, nchains(3) rseed(16) gibbs: regress y age sex bmi map tc ldl hdl tch ltg glu

Chain 1
Burn-in ...
Simulation ...

Chain 2
Burn-in ...
Simulation ...

Chain 3
Burn-in ...
Simulation ...

Model summary

Likelihood:
y ~ normal(xb_y,{sigma2})

Priors:
{y:age sex bmi map tc ldl hdl tch ltg glu _cons} ~ normal(0,10000)       (1)
{sigma2} ~ igamma(.01,.01)

(1) Parameters are elements of the linear form xb_y.

Bayesian linear regression                    Number of chains    =          3
Gibbs sampling                                Per MCMC chain:
Iterations      =     12,500
Burn-in         =      2,500
Sample size     =     10,000
Number of obs       =        442
Avg acceptance rate =          1
Avg efficiency: min =      .9041
avg =      .9853
max =          1
Avg log marginal-likelihood = -2435.5507      Max Gelman-Rubin Rc =          1

Equal-tailed
Mean   Std. Dev.     MCSE     Median  [95% Cred. Interval]

y
age    13.27542   51.20421   .295628   13.62079   -86.6546    112.654
sex    -163.025   51.94371   .299897  -163.0511  -265.2759  -62.49451
bmi     428.655    54.8511   .325351   428.6502   321.9517   536.9177
map     269.312   53.95068   .311484   269.9104   162.9158    375.273
tc   -33.02954   75.55542    .43803  -32.85623  -182.0179   113.7366
ld1   -72.58636   71.89936   .415111  -72.42117  -213.1838   68.32667
hd1    -185.885   65.39935   .377583  -185.9721  -313.8126  -57.03217
tch    121.3566   74.05157   .427537   121.5397  -24.13796   267.8116
ltg    370.8014   61.66557   .357777   371.1397   248.9354   492.0922
glu    103.5797   54.67812   .317191   103.3001  -3.523386   211.1194
_cons    152.0611   2.620844   .015131   152.0733   146.9208   157.1492

sigma2    3019.896   210.3033   1.27697   3010.666   2634.993   3462.634

Note: Default priors are used for model parameters.
Note: Default initial values are used for multiple chains.


The maximum Gelman–Rubin diagnostic is now 1 (or very close to one) and is less than 1.1, which indicates that no convergence issues are detected.

We can look at the graphical diagnostics for the ldl coefficient again.

. bayesgraph diagnostics {y:ldl}


There is a perfect overlap between the trace plots and the distributions of the three chains. The autocorrelation is essentially nonexistent, as is expected with Gibbs sampling, in all three chains.

Brooks, S. P., and A. Gelman. 1998. General methods for monitoring convergence of iterative simulations. Journal of Computational and Graphical Statistics 7: 434–455.

Efron, B., T. Hastie, I. Johnstone, and R. Tibshirani. 2004. Least angle regression. The Annals of Statistics 32: 407–499.

Gelman, A., and D. B. Rubin. 1992. Inference from iterative simulation using multiple sequences. Statistical Science 7: 457–472.