Home  /  Products  /  StataNow  /  Bayesian asymmetric Laplace model

<- See more new Stata features

Highlights

  • Asymmetric Laplace distribution (ALD) to model skewed outcomes

  • Flexible prior specifications

  • Bayesian quantile estimation via ALD

  • Simultaneous quantile estimation by using multiple equations

  • Include random effects at multiple levels of hierarchy

  • Model quantiles using nonlinear functions of predictors

  • Full support of Bayesian postestimation features

  • See more Bayesian analysis features

The bayesmh command now includes an asymmetric Laplace distribution (ALD) as a new likelihood function. You can use ALD to model nonnormal outcomes with pronounced skewness and kurtosis. You can also use it to fit Bayesian quantile regression models (Yu and Moyeed 2001). For a Bayesian univariate quantile regression, see the new bayes: qreg command. With bayesmh, you can fit Bayesian simultaneous, multilevel, and nonlinear quantile regression models. These features are part of StataNow™.

Let's see it work

Random-effects quantile regression

Consider the mathscores dataset, which contains math scores of pupils in the third year (math3) and fifth year (math5) from different schools (school) in Inner London (Mortimore et al. 1988).

. webuse mathscores

. describe

Contains data from https://www.stata-press.com/data/r18/mathscores.dta
 Observations:           887
    Variables:             3                  9 May 2022 23:31
Variable Storage Display Value                                     
name type format label Variable label
school float %9.0g School ID
math3 float %9.0g Year 3 math score
math5 float %9.0g Year 5 math score
Sorted by:

We are interested in fitting several Bayesian quantile regressions of math5 on math3 for different quantiles of math5. We also want to include a random intercept at the school level in the model to account for the between-school variability of the math scores.

Let's first fit a Bayesian random-intercept model for the median. We specify the asymlaplaceq({scale},0.5) likelihood option with a random scale parameter {scale} and a quantile value of 0.5. We use fairly uninformative priors for the coefficients (normal(0,10000)), scale (igamma(0.01,0.01)), and random-intercept variance (igamma(0.01,0.01)). We use blocking to improve the Markov chain Monte Carlo (MCMC) sampling and specify a random seed for reproducibility.

. bayesmh math5 math3 U[school], likelihood(asymlaplaceq({scale},0.5))
     prior({math5:}, normal(0,10000)) block({math5:}) 
     prior({scale},  igamma(0.01,0.01)) block({scale}) 
     prior({var_U},  igamma(0.01,0.01)) rseed(19)

Burn-in 2500 aaaaaaaaa1000aaaaaaaaa2000aaaaa done
Simulation 10000 .........1000.........2000.........3000.........4000.........50
> 00.........6000.........7000.........8000.........9000.........10000 done

Model summary
Likelihood: math5 ~ asymlaplaceq(xb_math5,{scale},0.5) Priors: {math5:math3 _cons} ~ normal(0,10000) (1) {U[school]} ~ normal(0,{var_U}) (1) {scale} ~ igamma(0.01,0.01) Hyperprior: {var_U} ~ igamma(0.01,0.01)
(1) Parameters are elements of the linear form xb_math5. Bayesian asymm. Laplace (quantile) regression MCMC iterations = 12,500 Random-walk Metropolis–Hastings sampling Burn-in = 2,500 MCMC sample size = 10,000 Number of obs = 887 Acceptance rate = .3555 Efficiency: min = .01064 avg = .08481 Log marginal-likelihood max = .2189
Equal-tailed
Mean Std. dev. MCSE Median [95% cred. interval]
math5
math3 .5771888 .0287285 .001293 .5769971 .5198219 .6337824
_cons 31.14144 .3642311 .035305 31.1465 30.4084 31.86673
scale 1.983294 .0683036 .00146 1.981383 1.853755 2.120894
var_U 3.679771 1.057536 .043052 3.561201 2.072422 6.134616

The posterior mean estimate for the math3 coefficient is 0.58, and the variance of the random intercept is 3.68. If we were to fit a Bayesian two-level linear regression that models the conditional mean using the same prior distributions, we would find that its estimates are similar to the estimates obtained from the median regression. A question of interest might be whether the regression coefficients change across quantiles. We can use the estimates from the median regression as a baseline for comparing models for other quantiles.

Random-effects simultaneous quantile regression

Let's use bayesmh to fit two Bayesian quantile regressions simultaneously using a two-equation specification. For example, let's consider the 0.25 and 0.75 quantiles. We label the two regression equations as math5_q25 and math5_q75 to avoid duplication of the outcome variable. Similarly to the previous model, we use uninformative priors for coefficients and variance parameters. We also use a common scale parameter {scale} because the outcome variable is the same in the two equations. The burn-in length is increased to 5,000 to improve sampling adaptation.

. bayesmh
     (math5_q25:math5 math3 U1[school], likelihood(asymlaplaceq({scale},0.25)))
     (math5_q75:math5 math3 U2[school], likelihood(asymlaplaceq({scale},0.75))), 
     prior({math5_q25:}, normal(0,10000)) block({math5_q25:}) 
     prior({math5_q75:}, normal(0,10000)) block({math5_q75:})
     prior({scale},      igamma(0.01,0.01)) 
     prior({var_U1},     igamma(0.01,0.01)) block({var_U1})
     prior({var_U2},     igamma(0.01,0.01)) block({var_U2})
     burnin(5000) rseed(19)

Burn-in 5000 aaaaaaaaa1000aaaaaaaaa2000aaaaaaaaa3000aaaaaaaaa4000aaaaaaaaa5000 
> done
Simulation 10000 .........1000.........2000.........3000.........4000.........50
> 00.........6000.........7000.........8000.........9000.........10000 done

Model summary
Likelihood: math5 ~ asymlaplaceq(math5 math3 U1[school],{scale},0.25) math5 ~ asymlaplaceq(math5 math3 U2[school],{scale},0.75) Prior: {scale} ~ igamma(0.01,0.01) Hyperpriors: {math5_q25:math3 U1 _cons} ~ normal(0,10000) {math5_q75:math3 U2 _cons} ~ normal(0,10000) {var_U1 var_U2} ~ igamma(0.01,0.01) {U1[school]} ~ normal(0,{var_U1}) {U2[school]} ~ normal(0,{var_U2})
Bayesian asymm. Laplace (quantile) regression MCMC iterations = 15,000 Random-walk Metropolis–Hastings sampling Burn-in = 5,000 MCMC sample size = 10,000 Number of obs = 887 Acceptance rate = .339 Efficiency: min = .01429 avg = .06357 Log marginal-likelihood max = .2034
Equal-tailed
Mean Std. dev. MCSE Median [95% cred. interval]
math5_q25
math3 .781409 .0331454 .001575 .7818769 .7174074 .844201
U1 1 0 0 1 1 1
_cons 27.11279 .4114259 .03442 27.09728 26.33384 27.93378
math5_q75
math3 .4021812 .0285305 .0013 .4029019 .344931 .4566547
U2 1 0 0 1 1 1
_cons 34.21954 .2754823 .018921 34.22476 33.66447 34.76017
scale 1.557948 .0373122 .000827 1.557052 1.486446 1.632176
var_U1 7.793937 2.041174 .076138 7.560785 4.519766 12.38803
var_U2 2.256973 .7545756 .036895 2.153597 1.128305 4.057043

The posterior mean estimates of the math3 coefficient for the 0.25 and 0.75 quantiles are 0.78 and 0.40, respectively, which are above and below the earlier median estimate of 0.58. It appears that the regression slope decreases as the quantile value increases. In particular, we can conclude that math3 is a stronger predictor of the 25th percentile of math5 scores than of the 75th percentile of math5 scores.

Modeling outcomes using asymmetric Laplace distribution

Finally, we use ALD to examine the skewness (asymmetry) of the math5 variable. For this purpose, we use asymlaplaceq() with a random quantile parameter {tau}. A negative skewness is indicated by {tau} greater than 0.5; positive skewness is indicated by {tau} less than 0.5; and {tau} of 0.5 indicates no skewness, a perfect symmetry. We apply a flat uniform(0, 1) prior for {tau}. We specify a regression model with only a constant term for math5, which corresponds to the location parameter of ALD, and a scale parameter {scale} as in the previous examples.

. bayesmh math5, likelihood(asymlaplaceq({scale}, {tau}))
     prior({math5:}, normal(0,10000)) 
     prior({scale},  igamma(0.01,0.01))
     prior({tau},    uniform(0,1))
     block({math5:} {scale}{tau}, split)
     initial({tau} 0.5) rseed(19)

Burn-in ...
Simulation ...

Model summary
Likelihood: math5 ~ asymlaplaceq({math5:_cons},{scale},{tau}) Priors: {math5:_cons} ~ normal(0,10000) {scale} ~ igamma(0.01,0.01) {tau} ~ uniform(0,1)
Bayesian asymm. Laplace (quantile) regression MCMC iterations = 12,500 Random-walk Metropolis–Hastings sampling Burn-in = 2,500 MCMC sample size = 10,000 Number of obs = 887 Acceptance rate = .424 Efficiency: min = .03815 avg = .05242 Log marginal-likelihood = -2846.2182 max = .07833
Equal-tailed
Mean Std. dev. MCSE Median [95% cred. interval]
math5
_cons 37.00486 .045309 .001619 37.00222 36.9161 37.10699
scale 1.073886 .0655504 .003246 1.071063 .9511403 1.211991
tau .8602184 .0085491 .000438 .8601294 .843211 .8767714

We see that the estimated 95% credible interval of [0.84, 0.88] suggests that {tau} is greater than 0.5, indicating a strong negative skewness. A more formal test for negative skewness can be conducted by using the bayestest interval command.

. bayestest interval {tau}, lower(0.5)

Interval tests     MCMC sample size =    10,000

       prob1 : {tau} > 0.5
 
Mean Std. dev. MCSE
prob1 1 0.00000 0

Indeed, the posterior probability of {tau} greater than 0.5 is 1.

References

Mortimore, P., P. Sammons, L. Stoll, D. Lewis, and R. Ecob. 1988. School Matters: The Junior Years. Wells, Somerset, UK: Open Books.

Yu, K., and R. A. Moyeed. 2001. Bayesian quantile regression. Statistics & Probability Letters 54: 437–447.

Tell me more

Learn more about Stata's Bayesian analysis features.

Read more about Bayesian analysis in the Stata Bayesian Analysis Reference Manual; see [BAYES] bayesmh.

Also see Bayesian quantile regression.

View all the new features in Stata 18.

Made for data science.

Get started today.