Programming your own Bayesian models

Order

<- See Stata's other features

Highlights

Write your own programs to calculate likelihood function and choose built-in priors
Write your own programs to calculate posterior density directly
Use built-in adaptive MH sampling to simulate marginal posterior
Efficiently sample random effects New
Support predictions New

Stata's bayesmh provides a variety of built-in Bayesian models for you to choose from; see the full list of available likelihood models and prior distributions. Cannot find the model you need? bayesmh also provides facilities for you to program your own Bayesian models.

Adding Bayesian models is easy. Simply write a Stata program that computes a posterior density following bayesmh's convention and specify the name of the program with the command. bayesmh will do the rest: it will simulate the marginal posterior distributions for all parameters and provide posterior summaries. After the simulation, your subsequent analyses are exactly the same as if you used one of the built-in models.

You can write a program that computes the overall log likelihood and choose priors from a list of built-in prior distributions or you can write a program that computes the entire log posterior.

Let's see it work

Log-likelihood evaluators

As a simple example, let's write a program that computes a log likelihood for a logistic regression model.

. sysuse auto
(1978 automobile data)

program logitll
     version 19
     args lnfj xb
     // compute log likelihood
     quietly replace `lnfj' = ln(invlogit( `xb')) if $MH_y == 1 & $MH_touse
     quietly replace `lnfj' = ln(invlogit(-`xb')) if $MH_y == 0 & $MH_touse
end

We called our program logitll. The program has two arguments: a temporary variable lnfj for storing the likelihood values over the estimation sample and a temporary variable xb that contains the linear predictor evaluated at the current values of parameters. The parameters are regression coefficients in our example.

Global macro MH_y contains the name of the binary dependent variable, MH_touse identifies the estimation sample, and MH_n contains the total number of observations.

We can now use our program with bayesmh to fit a Bayesian logistic regression model.

Suppose that we want to model whether a car is foreign or domestic as a function of car mileage. We specify the name of our log-likelihood evaluator in option llevaluator() and specify one of the built-in priors in option prior(). We use the flat prior for both coefficients, mpg and _cons.

. bayesmh foreign mpg, llevaluator(logitll) prior({foreign:}, flat) rseed(12345)

Burn-in ...
Simulation ...

Model summary

 
Likelihood:
  foreign ~ logitll(xb_foreign) 

Prior:
  {foreign:mpg _cons} ~ 1 (flat)                                           (1)
 
(1) Parameters are elements of the linear form xb_foreign.

Bayesian regression                              MCMC iterations  =     12,500
Random-walk Metropolis-Hastings sampling         Burn-in          =      2,500
                                                 MCMC sample size =     10,000
                                                 Number of obs    =         74
                                                 Acceptance rate  =      .2158
                                                 Efficiency:  min =     .08875
                                                              avg =      .0897
Log marginal-likelihood = -41.607547                          max =     .09066



                                                             Equal-tailed       

     foreign        Mean   Std. dev.     MCSE     Median  [95% cred. interval]



         mpg    .1694426    .056152   .001865   .1684498   .0644589   .2865277

       _cons   -4.604192   1.287691   .043225  -4.590255  -7.330972   -2.11855

The output is almost identical to that produced for built-in models. The only differences are in the title and model summary. The generic 'Bayesian regression' title is used, which can be changed via option title(). The model summary displays the name of the program used to compute the likelihood of this model. The results are the same as the ones that would have been obtained using the built-in likelihood(logit) model (given the same random-number seed and initial values).

We can choose a different built-in prior, for example, a normal prior with zero mean and variance of 25, again for both coefficients.

. bayesmh foreign mpg, llevaluator(logitll) prior({foreign:}, normal(0,25))
     rseed(12345)

Burn-in ...
Simulation ...

Model summary

 
Likelihood:
  foreign ~ logitll(xb_foreign)

Prior:
  {foreign:mpg _cons} ~ normal(0,25)                                       (1)
 
(1) Parameters are elements of the linear form xb_foreign.

Bayesian regression                              MCMC iterations  =     12,500
Random-walk Metropolis-Hastings sampling         Burn-in          =      2,500
                                                 MCMC sample size =     10,000
                                                 Number of obs    =         74
                                                 Acceptance rate  =      .2527
                                                 Efficiency:  min =     .09674
                                                              avg =     .09714
Log marginal-likelihood = -47.09598                           max =     .09753



                                                             Equal-tailed       

     foreign        Mean   Std. dev.     MCSE     Median  [95% cred. interval]



         mpg    .1582694   .0521761   .001671   .1562921   .0653649   .2642418

       _cons   -4.338317   1.188011   .038196  -4.279862  -6.778805  -2.230852

What if you need to use a prior that is not supported? For simple prior distributions, you can specify the expression for the prior density or log density in the corresponding prior()'s suboptions density() and logdensity(). For other prior distributions, you can incorporate them directly into the computation of the posterior density; see Log-posterior evaluators.

Log-posterior evaluators

In Log-likelihood evaluators, we created the logitll program to compute the log likelihood for a logistic model. Under the flat prior, a prior with the density of 1, the log posterior equals the log likelihood. So, assuming the flat prior, we can easily write a log-posterior evaluator by extending the logitll program.

. program logit_flat
     version 19
     args lnfj lnp xb
     // compute log likelihood
     quietly replace `lnfj' = ln(invlogit( `xb')) if $MH_y == 1 & $MH_touse
     quietly replace `lnfj' = ln(invlogit(-`xb')) if $MH_y == 0 & $MH_touse
     // compute log prior
     scalar `lnp' = 0
end

The program now has three arguments: a temporary variable lnfj for storing the likelihood values over the estimation sample, a temporary scalar lnp for storing the log-prior value, and a temporary variable xb that contains the linear predictor.

We specify the name of the log-posterior evaluator in option evaluator(). We do not specify the prior() option, because the prior information is already incorporated in the computation of the log-posterior density.

. bayesmh foreign mpg, evaluator(logit_flat) rseed(12345)

Burn-in ...
Simulation ...

Model summary

 
Posterior: 
  foreign ~ logitll(xb_foreign) 
 

Bayesian regression                              MCMC iterations  =     12,500
Random-walk Metropolis-Hastings sampling         Burn-in          =      2,500
                                                 MCMC sample size =     10,000
                                                 Number of obs    =         74
                                                 Acceptance rate  =      .2158
                                                 Efficiency:  min =     .08875
                                                              avg =      .0897
Log marginal-likelihood = -41.607547                          max =     .09066



                                                              Equal-tailed       

     foreign        Mean   Std. dev.     MCSE     Median  [95% cred. interval]



         mpg    .1694426    .056152   .001865   .1684498   .0644589   .2865277

       _cons   -4.604192   1.287691   .043225  -4.590255  -7.330972   -2.11855

As expected, we obtain results identical to the log-likelihood evaluator with the flat prior (given the same random-number seed).

As a demonstration, let's now write a log-posterior evaluator for the normal-prior model from Log-likelihood evaluators.

program logit_normal
     version 19
     args lnfj lnp xb
     // compute log likelihood
     quietly replace `lnfj' = ln(invlogit( `xb')) if $MH_y == 1 & $MH_touse
     quietly replace `lnfj' = ln(invlogit(-`xb')) if $MH_y == 0 & $MH_touse
     // compute log prior
     scalar `lnp' = lnnormalden($MH_b[1,1],0,5) + lnnormalden($MH_b[1,2],0,5)
end

We write a new program called logit_normal, which is a straightforward extension of the logitll program. We compute the log likelihood as we did before and also compute and return the log prior lnp. The log posterior is computed automatically as the sum of the the overall log likelihood and the log prior distributions of the two parameters. Global macro MH_b contains the name of a temporary vector of coefficients; $MH_b[1,1] contains the current value of the first parameter, mpg, and $MH_b[1,2] contains the current value of the second parameter, _cons.

We specify the name of the new evaluator in option evaluator().

. bayesmh foreign mpg, evaluator(logit_normal) rseed(12345)

Burn-in ...
Simulation ...

Model summary

 
Posterior:
  foreign ~ logitll(xb_foreign)
 

Bayesian regression                              MCMC iterations  =     12,500
Random-walk Metropolis-Hastings sampling         Burn-in          =      2,500
                                                 MCMC sample size =     10,000
                                                 Number of obs    =         74
                                                 Acceptance rate  =      .2527
                                                 Efficiency:  min =     .09674
                                                              avg =     .09714
Log marginal-likelihood = -47.09598                           max =     .09753



                                                              Equal-tailed       

     foreign        Mean   Std. dev.     MCSE     Median  [95% cred. interval]



         mpg    .1582694   .0521761   .001671   .1562921   .0653649   .2642418

       _cons   -4.338317   1.188011   .038196  -4.279862  -6.778805  -2.230852

The results are identical to the normal-prior model in Log-likelihood evaluators.

View a complete list of Bayesian analysis features.

Products

New in Stata 19

Why Stata

All features

Disciplines

Stata/MP

StataNow

Order Stata

Purchase

Order Stata

Bookstore

Stata Press

Stata Journal

Gift Shop

Learn

Free webinars

NetCourses

Classroom and web training

Organizational training

Video tutorials

Third-party courses

Web resources

Teaching with Stata

Support

Training

Video tutorials

FAQs

Statalist: The Stata Forum

Resources

Technical support

Customer service

Alerts

Company

News and events

Customer service

Careers

We use cookies

We use cookies to ensure that we give you the best experience on our website—to enhance site navigation, to analyze usage, and to assist in our marketing efforts. By continuing to use our site, you consent to the storing of cookies on your device and agree to delivery of content, including web fonts and JavaScript, from third party web services.

Cookie Settings

Privacy policy

Last updated: 16 November 2022

StataCorp LLC (StataCorp) strives to provide our users with exceptional products and services. To do so, we must collect personal information from you. This information is necessary to conduct business with our existing and potential customers. We collect and use this information only where we may legally do so. This policy explains what personal information we collect, how we use it, and what rights you have to that information.

Required cookies

Advertising cookies

Required cookies

These cookies are essential for our website to function and do not store any personally identifiable information. These cookies cannot be disabled.
Advertising and performance cookies

This website uses cookies to provide you with a better user experience. A cookie is a small piece of data our website stores on a site visitor's hard drive and accesses each time you visit so we can improve your access to our site, better understand how you use our site, and serve you content that may be of interest to you. For instance, we store a cookie when you log in to our shopping cart so that we can maintain your shopping cart should you not complete checkout. These cookies do not directly store your personal information, but they do support the ability to uniquely identify your internet browser and device.

Please note: Clearing your browser cookies at any time will undo preferences saved here. The option selected here will apply only to the device you are currently using.

Accept Cookies


		Equal-tailed
foreign		Mean Std. dev. MCSE Median [95% cred. interval]

mpg		.1694426 .056152 .001865 .1684498 .0644589 .2865277
_cons		-4.604192 1.287691 .043225 -4.590255 -7.330972 -2.11855