Bayesian analysis: Gelman–Rubin convergence diagnostic

Order

Watch video demo

<- See Stata's other features

Highlights

Compute using multiple chains
Maximum diagnostic automatically reported by bayes: and bayesmh
Compute diagnostics for each parameter
Compute diagnostics for functions of parameters
See more Bayesian analysis features

Establishing convergence of Markov chain Monte Carlo (MCMC) is one of the most important steps of Bayesian analysis. Multiple chains are often used to check MCMC convergence. The Gelman–Rubin convergence diagnostic provides a numerical convergence summary based on multiple chains. Stata's bayesstats grubin command computes this diagnostic for all model parameters. Also, when you simulate multiple chains using bayes: or bayesmh, these commands automatically report the maximum of the convergence diagnostics across all model parameters.

After using bayes, nchains(): or bayesmh, nchains() to simulate multiple chains, look at the maximum Gelman–Rubin diagnostic reported by the commands in the header. When it suggests nonconvergence, you can look at the individual diagnostics to discover which parameters have convergence problems. Type

. bayesstats grubin

Sometimes nonconvergence arises because you did not run enough simulations. Other times, it arises because of inherent problems with the model's specification. If you run more simulations without convergence and the same parameters keep being identified as problematic, you need to think about respecifying your model.

Let's see it work

Consider the diabetes data Efron et al. (2004) on 442 diabetes patients that record a measure of disease progression as an outcome y and 10 baseline covariates: age, sex, body mass index, mean arterial pressure, and six blood serum measurements. We consider the version of these data in which the covariates are standardized. This dataset is commonly used to demonstrate various model-building techniques, including Bayesian lasso. Here we use it to demonstrate the bayesstats grubin command.

Example of nonconvergence

We use bayes: regress to fit the Bayesian linear regression of y on all variables in the dataset and use option nchains(3) to generate three chains.

. bayes, nchains(3) rseed(16): regress y age sex bmi map tc ldl hdl tch ltg glu

Chain 1
  Burn-in ...
  Simulation ...

Chain 2
  Burn-in ...
  Simulation ...

Chain 3
  Burn-in ...
  Simulation ...

Model summary
                                                                              
Likelihood:
  y ~ regress(xb_y,{sigma2})

Priors:
  {y:age sex bmi map tc ldl hdl tch ltg glu _cons} ~ normal(0,10000)       (1)
                                          {sigma2} ~ igamma(.01,.01)
                                                                              
(1) Parameters are elements of the linear form xb_y.

Bayesian linear regression                    Number of chains    =          3
Random-walk Metropolis-Hastings sampling      Per MCMC chain:
                                                  Iterations      =     12,500
                                                  Burn-in         =      2,500
                                                  Sample size     =     10,000
                                              Number of obs       =        442
                                              Avg acceptance rate =       .321
                                              Avg efficiency: min =    .003771
                                                              avg =     .01886
                                                              max =      .1142
Avg log marginal-likelihood = -2457.6885      Max Gelman-Rubin Rc =      4.543



                                                              Equal-tailed    
                    Mean   Std. dev.     MCSE     Median  [95% cred. interval]

y                                                                             
         age    81.50951   87.57016   4.06945    71.7336  -50.84804   234.4335
         sex   -167.0296    71.9136   6.76093   -167.887   -289.098  -42.73281
         bmi    352.3836   99.94654   6.13795    360.123    193.378   505.5432
         map    286.0075   78.16619   5.23075   275.4511   174.2242   426.8137
          tc   -273.1433   164.7453    14.618  -295.8386  -501.8144  -16.60783
         ld1     165.232    178.513   8.84052    231.598  -125.8265   352.7025
         hd1   -94.36373   114.5865   7.16308  -106.0647  -263.5826   99.65676
         tch     109.925   162.0595   13.0143   134.8212  -155.8764   332.5971
         ltg     483.243   102.1253   6.00414   495.4546   307.5461   627.0424
         glu    42.76467    117.208   6.29098   65.24344  -146.8923    198.615
       _cons    152.4957   2.780968   .103842   152.4606   147.3315   157.9954

      sigma2     3110.56   221.9696   3.79219   3100.067   2707.435   3566.951


Note: Default priors are used for model parameters.
Note: Default initial values are used for multiple chains.
Note: There is a high autocorrelation after 500 lags in at least one of the
      chains.

The maximum Gelman–Rubin diagnostic across all model parameters is labeled as Max Gelman–Rubin Rc in the header and is 4.543. Gelman and Rubin (1992) and Brooks and Gelman (1998) suggest that diagnostic Rc values greater than 1.2 for any of the model parameters should indicate nonconvergence. In practice, a more stringent rule of Rc < 1.1 is often used to declare convergence.

In our example, the maximum Rc of 4.543 is larger than 1.1, so our MCMC did not converge. The presence of the note about high autocorrelation is also concerning, although it may not necessarily indicate nonconvergence.

We can explore Gelman–Rubin diagnostics for all parameters to identify the problematic ones. We use bayesstats grubin for that. We additionally sort the results from largest to smallest to find the most problematic parameters.

. bayesstats grubin, sort

Gelman-Rubin convergence diagnostic

Number of chains     =            3
MCMC size, per chain =       10,000
Max Gelman-Rubin Rc  =     4.542823



         Rc

y                       
         ldl    4.542823
         tch    3.376646
          tc    3.184339
         glu    3.089546
         bmi    2.543104
         hdl    2.447089
         map    2.421061
         age     2.40928
         ltg    2.389624
         sex    1.680013
       _cons    1.082658

      sigma2    1.023543


Convergence rule: Rc < 1.1

The coefficient for ldl has the maximum Rc value. All parameters except the last two exceed 1.1. Our model certainly has convergence issues.

Let's look at graphical diagnostics for the ldl coefficient.

. bayesgraph diagnostics {y:ldl}

There is a clear separation in the trace plots of the chains, and the distributions of all three chains are different.

So what do we do now? The answer is model- and data-specific. Sometimes we may simply need to increase our burn-in period, option burnin(). Other times, especially in the presence of highly correlated model parameters, we may need to improve efficiency of our sampling procedure by blocking parameters or using more efficient samplers; see Specifying MCMC sampling procedure. When the data provide little information about some of the model parameters, we may need to use more informative priors or entirely modify our model. See Convergence of MCMC for other suggestions.

We provide the solution for our model in the next section.

Example of convergence

In the above Example of nonconvergence, we fit a Bayesian linear regression to the diabetes data and encountered problems with MCMC convergence using our default model specification.

From the bayes: regress output above, there appears to be a large variation in the magnitudes of the estimated coefficients. We may try placing parameters with similar magnitudes in separate blocks to improve simulation efficiency.

Also, many variables in the diabetes dataset are correlated, and some are highly correlated. This induces high correlations between some of the model parameters. (For instance, you can use bayesgraph matrix to explore pairwise correlations between model parameters.) The default adaptive Metropolis–Hastings used by bayes: and bayesmh may suffer from low efficiency in the presence of many highly correlated model parameters. In our example, the default priors used by bayes: regress—normal for coefficients and inverse gamma for the variance—allow us to use a more efficient Gibbs sampling.

. bayes, nchains(3) rseed(16) gibbs: regress y age sex bmi map tc ldl hdl
     tch ltg glu

Chain 1
  Burn-in ...
  Simulation ...

Chain 2
  Burn-in ...
  Simulation ...

Chain 3
  Burn-in ...
  Simulation ...

Model summary
                                                                              
Likelihood:
  y ~ normal(xb_y,{sigma2})

Priors:
  {y:age sex bmi map tc ldl hdl tch ltg glu _cons} ~ normal(0,10000)       (1)
                                          {sigma2} ~ igamma(.01,.01)
                                                                              
(1) Parameters are elements of the linear form xb_y.

Bayesian linear regression                    Number of chains    =          3
Gibbs sampling                                Per MCMC chain:
                                                  Iterations      =     12,500
                                                  Burn-in         =      2,500
                                                  Sample size     =     10,000
                                              Number of obs       =        442
                                              Avg acceptance rate =          1
                                              Avg efficiency: min =      .9041
                                                              avg =      .9853
                                                              max =          1
Avg log marginal-likelihood = -2435.5507      Max Gelman-Rubin Rc =          1



                                                              Equal-tailed
                    Mean   Std. dev.     MCSE     Median  [95% cred. interval]

y                                                                             
         age    13.27542   51.20421   .295628   13.62079   -86.6546    112.654
         sex    -163.025   51.94371   .299897  -163.0511  -265.2759  -62.49451
         bmi     428.655    54.8511   .325351   428.6502   321.9517   536.9177
         map     269.312   53.95068   .311484   269.9104   162.9158    375.273
          tc   -33.02954   75.55542    .43803  -32.85623  -182.0179   113.7366
         ld1   -72.58636   71.89936   .415111  -72.42117  -213.1838   68.32667
         hd1    -185.885   65.39935   .377583  -185.9721  -313.8126  -57.03217
         tch    121.3566   74.05157   .427537   121.5397  -24.13796   267.8116
         ltg    370.8014   61.66557   .357777   371.1397   248.9354   492.0922
         glu    103.5797   54.67812   .317191   103.3001  -3.523386   211.1194
       _cons    152.0611   2.620844   .015131   152.0733   146.9208   157.1492

      sigma2    3019.896   210.3033   1.27697   3010.666   2634.993   3462.634


Note: Default priors are used for model parameters.
Note: Default initial values are used for multiple chains.

The maximum Gelman–Rubin diagnostic is now 1 (or very close to one) and is less than 1.1, which indicates that no convergence issues are detected.

We can look at the graphical diagnostics for the ldl coefficient again.

. bayesgraph diagnostics {y:ldl}

There is a perfect overlap between the trace plots and the distributions of the three chains. The autocorrelation is essentially nonexistent, as is expected with Gibbs sampling, in all three chains.

References

Brooks, S. P., and A. Gelman. 1998. General methods for monitoring convergence of iterative simulations. Journal of Computational and Graphical Statistics 7: 434–455.

Efron, B., T. Hastie, I. Johnstone, and R. Tibshirani. 2004. Least angle regression. The Annals of Statistics 32: 407–499.

Gelman, A., and D. B. Rubin. 1992. Inference from iterative simulation using multiple sequences. Statistical Science 7: 457–472.

Source. McSourcey Pants. Even here, remember the last source should have the class "last".

Tell me more

Learn more about multiple chains, Bayesian predictions, and Bayesian lasso.

Learn more about Stata's Bayesian analysis features.

Read more about Gelman–Rubin convergence diagnostic and Bayesian analysis in the Bayesian Analysis Reference Manual.

Products

New in Stata 19

Why Stata

All features

Disciplines

Stata/MP

StataNow

Order Stata

Purchase

Order Stata

Bookstore

Stata Press

Stata Journal

Gift Shop

Learn

Free webinars

NetCourses

Classroom and web training

Organizational training

Video tutorials

Third-party courses

Web resources

Teaching with Stata

Support

Training

Video tutorials

FAQs

Statalist: The Stata Forum

Resources

Technical support

Customer service

Alerts

Company

News and events

Customer service

Careers

We use cookies

We use cookies to ensure that we give you the best experience on our website—to enhance site navigation, to analyze usage, and to assist in our marketing efforts. By continuing to use our site, you consent to the storing of cookies on your device and agree to delivery of content, including web fonts and JavaScript, from third party web services.

Cookie Settings

Privacy policy

Last updated: 16 November 2022

StataCorp LLC (StataCorp) strives to provide our users with exceptional products and services. To do so, we must collect personal information from you. This information is necessary to conduct business with our existing and potential customers. We collect and use this information only where we may legally do so. This policy explains what personal information we collect, how we use it, and what rights you have to that information.

Required cookies

Advertising cookies

Required cookies

These cookies are essential for our website to function and do not store any personally identifiable information. These cookies cannot be disabled.
Advertising and performance cookies

This website uses cookies to provide you with a better user experience. A cookie is a small piece of data our website stores on a site visitor's hard drive and accesses each time you visit so we can improve your access to our site, better understand how you use our site, and serve you content that may be of interest to you. For instance, we store a cookie when you log in to our shopping cart so that we can maintain your shopping cart should you not complete checkout. These cookies do not directly store your personal information, but they do support the ability to uniquely identify your internet browser and device.

Please note: Clearing your browser cookies at any time will undo preferences saved here. The option selected here will apply only to the device you are currently using.

Accept Cookies


		Equal-tailed
		Mean Std. dev. MCSE Median [95% cred. interval]

y
age		81.50951 87.57016 4.06945 71.7336 -50.84804 234.4335
sex		-167.0296 71.9136 6.76093 -167.887 -289.098 -42.73281
bmi		352.3836 99.94654 6.13795 360.123 193.378 505.5432
map		286.0075 78.16619 5.23075 275.4511 174.2242 426.8137
tc		-273.1433 164.7453 14.618 -295.8386 -501.8144 -16.60783
ld1		165.232 178.513 8.84052 231.598 -125.8265 352.7025
hd1		-94.36373 114.5865 7.16308 -106.0647 -263.5826 99.65676
tch		109.925 162.0595 13.0143 134.8212 -155.8764 332.5971
ltg		483.243 102.1253 6.00414 495.4546 307.5461 627.0424
glu		42.76467 117.208 6.29098 65.24344 -146.8923 198.615
_cons		152.4957 2.780968 .103842 152.4606 147.3315 157.9954

sigma2		3110.56 221.9696 3.79219 3100.067 2707.435 3566.951


		Rc

y
ldl		4.542823
tch		3.376646
tc		3.184339
glu		3.089546
bmi		2.543104
hdl		2.447089
map		2.421061
age		2.40928
ltg		2.389624
sex		1.680013
_cons		1.082658

sigma2		1.023543


		Equal-tailed
		Mean Std. dev. MCSE Median [95% cred. interval]

y
age		13.27542 51.20421 .295628 13.62079 -86.6546 112.654
sex		-163.025 51.94371 .299897 -163.0511 -265.2759 -62.49451
bmi		428.655 54.8511 .325351 428.6502 321.9517 536.9177
map		269.312 53.95068 .311484 269.9104 162.9158 375.273
tc		-33.02954 75.55542 .43803 -32.85623 -182.0179 113.7366
ld1		-72.58636 71.89936 .415111 -72.42117 -213.1838 68.32667
hd1		-185.885 65.39935 .377583 -185.9721 -313.8126 -57.03217
tch		121.3566 74.05157 .427537 121.5397 -24.13796 267.8116
ltg		370.8014 61.66557 .357777 371.1397 248.9354 492.0922
glu		103.5797 54.67812 .317191 103.3001 -3.523386 211.1194
_cons		152.0611 2.620844 .015131 152.0733 146.9208 157.1492

sigma2		3019.896 210.3033 1.27697 3010.666 2634.993 3462.634