Home  /  Products  /  Stata 14  /  Programming your own Bayesian models

Bayesian analysis was introduced in Stata 14.

See the latest version of Bayesian models. See all of Stata's Bayesian analysis features.

See the new features in Stata 18.

Programming your own Bayesian models

What's this about?

Stata's bayesmh provides a variety of built-in Bayesian models for you to choose from; see the full list of available likelihood models and prior distributions. Cannot find the model you need? bayesmh also provides facilities for you to program your own Bayesian models.

Adding Bayesian models is easy. Simply write a Stata program that computes a posterior density following bayesmh's convention and specify the name of the program with the command. bayesmh will do the rest: it will simulate the marginal posterior distributions for all parameters and provide posterior summaries. After the simulation, your subsequent analyses are exactly the same as if you used one of the built-in models.

You can write a program that computes the overall log likelihood and choose priors from a list of built-in prior distributions or you can write a program that computes the entire log posterior.

Let's see it work

Log-likelihood evaluators

As a simple example, let's write a program that computes a log likelihood for a logistic regression model.

program logitll
        version 14.1
        args lnf xb
        // compute log likelihood
        tempvar lnfj
        quietly generate double `lnfj' = ln(invlogit( `xb')) ///
                                                     if $MH_y == 1 & $MH_touse
        quietly replace `lnfj' = ln(invlogit(-`xb')) if $MH_y == 0 & $MH_touse
        quietly summarize `lnfj', meanonly
        scalar `lnf' = r(sum)
        if r(N) < $MH_n {
                scalar `lnf' = .
                exit
        }
end

We called our program logitll. The program has two arguments: a temporary scalar lnf for storing a value of the overall likelihood and a temporary variable xb that contains the linear predictor evaluated at the current values of parameters. The parameters are regression coefficients in our example.

Global macro MH_y contains the name of the binary dependent variable, MH_touse identifies the estimation sample, and MH_n contains the total number of observations.

We created an intermediate temporary variable lnfj to contain the observation-specific log likelihood. The sum of this variable is the overall log likelihood, which we stored in lnf. If the log-likelihood computation evaluates to missing in at least one observation, the overall likelihood lnf is set to missing.

We can now use our program with bayesmh to fit a Bayesian logistic regression model.

Suppose that we want to model whether a car is foreign or domestic as a function of car mileage. We specify the name of our log-likelihood evaluator in option llevaluator() and specify one of the built-in priors in option prior(). We use the flat prior for both coefficients, mpg and _cons.

. bayesmh foreign mpg, llevaluator(logitll) prior({foreign:}, flat)

Burn-in ...
Simulation ...

Model summary
Likelihood:
foreign ~ logitll(xb_foreign)
Prior
{foreign:mpg _cons} ~ 1 (flat) (1)
(1) Parameters are elements of the linear form xb_foreign.
Bayesian regression MCMC iterations = 12,500
Random-walk Metropolis-Hastings sampling Burn-in = 2,500
MCMC sample size = 10,000
Number of obs = 74
Acceptance rate = .2714
Efficiency: min = .09526
avg = .0956
Log marginal likelihood = -41.616161 max = .09594
Equal-tailed
foreign Mean Std. Dev. MCSE Median [95% Cred. Interval]
mpg .1706509 .0553838 .001794 .1689529 .0700874 .2867964
_cons -4.63653 1.274421 .041146 -4.564754 -7.247727 -2.325101

The output is almost identical to that produced for built-in models. The only differences are in the title and model summary. The generic 'Bayesian regression' title is used, which can be changed via option title(). The model summary displays the name of the program used to compute the likelihood of this model. The results are the same as the ones that would have been obtained using the built-in likelihood(logit) model (given the same random-number seed and initial values).

We can choose a different built-in prior, for example, a normal prior with zero mean and variance of 25, again for both coefficients.

. bayesmh foreign mpg, llevaluator(logitll) prior({foreign:}, normal(0,25))

Burn-in ...
Simulation ...

Model summary
Likelihood:
foreign ~ logitll(xb_foreign)
Prior
{foreign:mpg _cons} ~ normal(0,25) (1)
(1) Parameters are elements of the linear form xb_foreign.
Bayesian regression MCMC iterations = 12,500
Random-walk Metropolis-Hastings sampling Burn-in = 2,500
MCMC sample size = 10,000
Number of obs = 74
Acceptance rate = .2125
Efficiency: min = .09213
avg = .0953
Log marginal likelihood = -47.146104 max = .09847
Equal-tailed
foreign Mean Std. Dev. MCSE Median [95% Cred. Interval]
mpg .1584296 .0498191 .001641 .1549375 .0650025 .2631174
_cons -4.337211 1.160714 .03699 -4.319106 -6.783982 -2.203245

What if you need to use a prior that is not supported? For simple prior distributions, you can specify the expression for the prior density or log density in the corresponding prior()'s suboptions density() and logdensity(). For other prior distributions, you can incorporate them directly into the computation of the posterior density; see Log-posterior evaluators.

Log-posterior evaluators

In Log-likelihood evaluators, we created the logitll program to compute the log likelihood for a logistic model. Under the flat prior, a prior with the density of 1, the log posterior equals the log likelihood. So, assuming the flat prior, we can use our log-likelihood evaluator as the log-posterior evaluator as well.

We specify the name of the log-posterior evaluator in option evaluator(). We do not specify the prior() option, because the prior information is already incorporated in the computation of the log-posterior density.

. bayesmh foreign mpg, evaluator(logitll)

Burn-in ...
Simulation ...

Model summary
Posterior:
foreign ~ logitll(xb_foreign)
Bayesian regression MCMC iterations = 12,500
Random-walk Metropolis-Hastings sampling Burn-in = 2,500
MCMC sample size = 10,000
Number of obs = 74
Acceptance rate = .2714
Efficiency: min = .09526
avg = .0956
Log marginal likelihood = -41.616161 max = .09594
Equal-tailed
foreign Mean Std. Dev. MCSE Median [95% Cred. Interval]
mpg .1706509 .0553838 .001794 .1689529 .0700874 .2867964
_cons -4.63653 1.274421 .041146 -4.564754 -7.247727 -2.325101

As expected, we obtain results identical to the log-likelihood evaluator with the flat prior (given the same random-number seed).

As a demonstration, let's now write a log-posterior evaluator for the normal-prior model from Log-likelihood evaluators.

program logit_normal
        version 14.1
        args lnp xb
        // compute log likelihood
        tempvar lnfj
        quietly generate double `lnfj' = ln(invlogit( `xb')) ///
                                                     if $MH_y == 1 & $MH_touse
        quietly replace `lnfj' = ln(invlogit(-`xb')) if $MH_y == 0 & $MH_touse
        quietly summarize `lnfj', meanonly
        tempname lnf
        scalar `lnf' = r(sum)
        if r(N) < $MH_n {
                scalar `lnp' = .
                exit
        }
        // compute log posterior
        scalar `lnp' = `lnf' + lnnormalden($MH_b[1,1],0,5) + ///
                               lnnormalden($MH_b[1,2],0,5)
end

We write a new program called logit_normal, which is a straightforward extension of the logitll program. We compute the log likelihood as we did before, but we now return the overall log posterior lnp instead of the overall log likelihood lnf. The log posterior lnp is computed as the sum of the log likelihood and the log prior distributions of the two parameters. Global macro MH_b contains the name of a temporary vector of coefficients; $MH_b[1,1] contains the current value of the first parameter, mpg; and $MH_b[1,2] contains the current value of the second parameter, _cons.

We specify the name of the new evaluator in option evaluator().

. bayesmh foreign mpg, evaluator(logit_normal)

Burn-in ...
Simulation ...

Model summary
Posterior:
foreign ~ logitll(xb_foreign)
Bayesian regression MCMC iterations = 12,500
Random-walk Metropolis-Hastings sampling Burn-in = 2,500
MCMC sample size = 10,000
Number of obs = 74
Acceptance rate = .2125
Efficiency: min = .09213
avg = .0953
Log marginal likelihood = -47.146104 max = .09847
Equal-tailed
foreign Mean Std. Dev. MCSE Median [95% Cred. Interval]
mpg .1584296 .0498191 .001641 .1549375 .0650025 .2631174
_cons -4.337211 1.160714 .03699 -4.319106 -6.783982 -2.203245

The results are identical to the normal-prior model in Log-likelihood evaluators.

Tell me more

Read the overview from the Stata News.

View a complete list of Bayesian analysis features.

Upgrade now Order Stata