Laplace prior for specifying L1 penalty for coefficients

Gamma prior for the penalty hyperparameter

Bayesian criteria for selecting important predictors

Stata has a suite of commands for performing
lasso-based prediction
and inference.
But did you know that you can fit Bayesian lasso linear models by using
**bayes: regress**? And that
inference can be performed in the usual Bayesian way, just as with any
other model?

Lasso uses L1-constrained least squares to estimate its coefficients. This involves selecting a value for the penalty parameter, which is often done using cross-validation.

The Bayesian approach is different. Constraints are imposed through prior distributions. The Laplace prior provides constraints that have the same analytic form as the L1 penalty used in lasso. The Laplace prior introduces the penalty parameter as one more model parameter that needs to be estimated from the data. One advantage of the Bayesian approach is that the reported credible intervals can be used to obtain proper inference for the parameters of interest. Within the frequentist approach, special methods are needed to obtain proper inference; see Lasso for inference.

To fit a lasso-style model, use the **bayes:** prefix or the **bayesmh** command with Laplace
priors. No variable selection is performed automatically, but Bayesian
analysis offers various ways to select variables. One is to use **bayesstats summary** to
display a table of the posterior probabilities that each coefficient is
different from 0 and select variables based on them.

Let's look at an example. Consider the diabetes data (Efron et
al. 2004) on 442 diabetes patients that record a measure of disease
progression (one year after baseline) as an outcome **y** and 10
baseline covariates: age, sex, body mass index, mean arterial pressure, and
6 blood serum measurements. We consider the version of these data in which
the covariates are standardized to have zero mean and the sum of squares
across all observations of one.

We would like to fit a linear regression model to **y** and determine which
variables are important for predicting **y**. Efron et al.
(2004) applied the least-angle-regression variable-selection method to these
data. Park and Casella (2008) introduced Bayesian lasso
and used it to analyze these same data. Below, we follow their approach to
demonstrate how this can be done using **bayes: regress**.

For comparison, let's first use the traditional **lasso** command to select the important
predictors.

.lasso linear y age sex bmi map tc ldl hdl tch ltg glu, nolog rseed(16)Lasso linear model No. of obs = 442 No. of covariates = 10 Selection: Cross-validation No. of CV folds = 10

No. of Out-of- CV mean | ||||

nonzero sample prediction | ||||

ID | Description lambda coef. R-squared error | |||

1 | first lambda 45.16003 0 0.0008 5934.909 | |||

47 | lambda before .6254151 8 0.4896 3026.453 | |||

* 48 | selected lambda .5698549 8 0.4896 3026.42 | |||

49 | lambda after .5192306 8 0.4896 3026.567 | |||

83 | last lambda .0219595 10 0.4891 3029.735 | |||

The estimated penalty parameter **lambda** for the selected model is
0.57. We can see the selected coefficients and their penalized estimates
by typing

.lassocoef, display(coef, penalized)

active | ||||

sex | -213.4102 | |||

bmi | 524.8137 | |||

map | 306.6641 | |||

tc | -154.2327 | |||

hdl | -184.5127 | |||

tch | 58.66197 | |||

ltg | 523.1158 | |||

glu | 60.12939 | |||

_cons | 152.1335 | |||

We now use **bayes: regress**
to fit a Bayesian linear lasso model as described by Park and
Casella (2008).

Let's look at the command specification first.

.bayes, prior({y:age sex bmi map tc ldl hdl tch ltg glu}, laplace(0, (sqrt({sigma2}/{lam2})))) prior({sigma2}, jeffreys) prior({y:_cons}, normal(0, 1e6)) prior({lam2=1}, gamma(1, 1/1.78)) block({y:} {sigma2} {lam2}, split) rseed(16) dots: : regress y age sex bmi map tc ldl hdl tch ltg glu

For the coefficients, we use the Laplace prior with zero mean and the scale
parameter that depends on the variance **{sigma2}** and squared
penalization hyperparameter **{lam2}**. The authors suggest using the
gamma prior with shape 1 and rate 1.78 (or scale 1/1.78) as the prior for
**{lam2}**. For the intercept **{y:_cons}** and variance, we use vague
priors: **normal(0, 1e6)** and **jeffreys**. To improve sampling
efficiency for these data, we sample all parameters in separate blocks.

Let's now run the model.

.bayes, prior({y:age sex bmi map tc ldl hdl tch ltg glu}, laplace(0, (sqrt({sigma2}/{lam2})))) prior({sigma2}, jeffreys) prior({y:_cons}, normal(0, 1e6)) prior({lam2=1}, gamma(1, 1/1.78)) block({y:} {sigma2} {lam2}, split) rseed(16) dots : regress y age sex bmi map tc ldl hdl tch ltg gluBurn-in 2500 aaaaaaaaa1000aaaaaaaaa2000aaaaa done Simulation 10000 .........1000.........2000.........3000.........4000.........5 > 000.........6000.........7000.........8000.........9000.........10000 done Model summary

Equal-tailed | ||||

Mean Std. dev. MCSE Median [95% cred. interval] | ||||

y | ||||

age | -2.478525 52.97851 1.26623 -2.593401 -108.3303 104.2442 | |||

sex | -209.4461 61.21979 1.70006 -211.1479 -330.2515 -85.52568 | |||

bmi | 522.1367 66.76557 1.8115 520.6348 393.4224 656.4993 | |||

map | 304.1617 65.26244 1.77912 306.1749 175.0365 432.3554 | |||

tc | -172.2847 157.5097 12.7739 -159.4816 -523.0447 110.226 | |||

ldl | 1.304382 128.3598 9.96343 -7.796492 -251.2571 298.4382 | |||

hdl | -158.8146 112.6562 6.82563 -158.1347 -378.4126 48.93263 | |||

tch | 91.27437 111.8483 6.06667 86.32462 -114.675 319.0824 | |||

ltg | 515.5167 94.06607 5.83902 509.9952 342.9893 715.739 | |||

glu | 67.94583 62.86024 1.69235 66.11433 -51.1174 197.7894 | |||

_cons | 152.0964 2.545592 .053095 152.0963 146.9166 157.1342 | |||

sigma2 | 2961.246 207.0183 4.79372 2949.282 2587.023 3389.206 | |||

lam2 | .0889046 .055257 .001899 .0769573 .020454 .229755 | |||

The coefficient estimates above are somewhat similar to the penalized estimates
of the variables selected by **lasso**.

We can compare the estimates of the penalized parameters:

.bayesstats summary (lambda:sqrt({lam2}))Posterior summary statistics MCMC sample size = 10,000 lambda : sqrt({lam2})

Equal-tailed | ||||

Mean Std. dev. MCSE Median [95% cred. interval] | ||||

lambda | .285446 .0861738 .003292 .2774119 .1430174 .4793277 | |||

Bayesian lasso estimated **lambda** to be 0.29, which is smaller than
**lasso**'s 0.57, but this estimate is not directly comparable because
**lasso** standardizes covariates to have the scale of 1 during
the computation.

Although Bayesian lasso does not automatically select variables, we can use
some of Bayesian postestimation tools to help us select the variables. For
instance, we can use **bayesstats summary** to compute
the posterior probabilities that each coefficient is less than 0. If the
probability is close to 0.5, then the coefficient estimate is centered around
0, and the corresponding variable should not be selected in the model.

.bayesstats summary (age:{y:age}<0) (sex:{y:sex}<0) (bmi:{y:bmi}<0) (map:{y:map}<0) (tc:{y:tc}<0) (ldl:{y:ldl}<0) (hdl:{y:hdl}<0) (tch:{y:tch}<0) (ltg:{y:ltg}<0) (glu:{y:glu}<0)Posterior summary statistics MCMC sample size = 10,000 age : {y:age}<0 sex : {y:sex}<0 bmi : {y:bmi}<0 map : {y:map}<0 tc : {y:tc}<0 ldl : {y:ldl}<0 hdl : {y:hdl}<0 tch : {y:tch}<0 ltg : {y:ltg}<0 glu : {y:glu}<0

Equal-tailed | ||||

Mean Std. dev. MCSE Median [95% cred. interval] | ||||

age | .5277 .4992571 .011353 1 0 1 | |||

sex | .9997 .0173188 .000224 1 1 1 | |||

bmi | 0 0 0 0 0 0 | |||

map | 0 0 0 0 0 0 | |||

tc | .8815 .3232154 .018971 1 0 1 | |||

ldl | .5301 .4991181 .029122 1 0 1 | |||

hdl | .9226 .2672384 .01198 1 0 1 | |||

tch | .1992 .3994187 .016507 0 0 1 | |||

ltg | .1992 .3994187 .016507 0 0 1 | |||

glu | .1417 .3487596 .008577 0 0 1 | |||

There are two variables, **age** and **ldl**, for which the probability
that their coefficients are less than 0 is close to 0.5. So, we would drop
these two variables from the model. Thus, we arrive at the same set of
variables as selected earlier by **lasso**.

Efron, B., T. Hastie, I. Johnstone, and R. Tibshirani 2004.
Least angle regression. *The Annals of Statistics* 32: 407–499.

Park, T.,and G. Casella. 2008. Bayesian lasso. *Journal of the American Statistical Association* 103: 681–686.

See all Bayesian features and the *Stata Bayesian Analysis Reference Manual*.