Bayesian Lasso

Order

<- See Stata's other features

Highlights

Laplace prior for specifying L1 penalty for coefficients
Gamma prior for the penalty hyperparameter
Bayesian criteria for selecting important predictors
See more Bayesian analysis features

Stata has a suite of commands for performing lasso-based prediction and inference. But did you know that you can fit Bayesian lasso linear models by using bayes: regress? And that inference can be performed in the usual Bayesian way, just as with any other model?

Lasso uses L1-constrained least squares to estimate its coefficients. This involves selecting a value for the penalty parameter, which is often done using cross-validation.

The Bayesian approach is different. Constraints are imposed through prior distributions. The Laplace prior provides constraints that have the same analytic form as the L1 penalty used in lasso. The Laplace prior introduces the penalty parameter as one more model parameter that needs to be estimated from the data. One advantage of the Bayesian approach is that the reported credible intervals can be used to obtain proper inference for the parameters of interest. Within the frequentist approach, special methods are needed to obtain proper inference; see Lasso for inference.

To fit a lasso-style model, use the bayes: prefix or the bayesmh command with Laplace priors. No variable selection is performed automatically, but Bayesian analysis offers various ways to select variables. One is to use bayesstats summary to display a table of the posterior probabilities that each coefficient is different from 0 and select variables based on them.

Let's look at an example. Consider the diabetes data (Efron et al. 2004) on 442 diabetes patients that record a measure of disease progression (one year after baseline) as an outcome y and 10 baseline covariates: age, sex, body mass index, mean arterial pressure, and 6 blood serum measurements. We consider the version of these data in which the covariates are standardized to have zero mean and the sum of squares across all observations of one.

We would like to fit a linear regression model to y and determine which variables are important for predicting y. Efron et al. (2004) applied the least-angle-regression variable-selection method to these data. Park and Casella (2008) introduced Bayesian lasso and used it to analyze these same data. Below, we follow their approach to demonstrate how this can be done using bayes: regress.

For comparison, let's first use the traditional lasso command to select the important predictors.

. lasso linear y age sex bmi map tc ldl hdl tch ltg glu, nolog rseed(16)

Lasso linear model                          No. of obs        =        442
                                            No. of covariates =         10
Selection: Cross-validation                 No. of CV folds   =         10



                                          No. of      Out-of-      CV mean
                                         nonzero       sample   prediction
      ID       Description      lambda     coef.    R-squared        error

       1      first lambda    45.16003         0       0.0008     5934.909
      47     lambda before    .6254151         8       0.4896     3026.453
    * 48   selected lambda    .5698549         8       0.4896      3026.42
      49      lambda after    .5192306         8       0.4896     3026.567
      83       last lambda    .0219595        10       0.4891     3029.735

* lambda selected by cross-validation.

The estimated penalty parameter lambda for the selected model is 0.57. We can see the selected coefficients and their penalized estimates by typing

. lassocoef, display(coef, penalized)



                  active

         sex   -213.4102
         bmi    524.8137
         map    306.6641
          tc   -154.2327
         hdl   -184.5127
         tch    58.66197
         ltg    523.1158
         glu    60.12939
       _cons    152.1335

We now use bayes: regress to fit a Bayesian linear lasso model as described by Park and Casella (2008).

Let's look at the command specification first.

. bayes, prior({y:age sex bmi map tc ldl hdl tch ltg glu}, laplace(0, (sqrt({sigma2}/{lam2}))))
     prior({sigma2},  jeffreys)
     prior({y:_cons}, normal(0, 1e6))
     prior({lam2=1},  gamma(1, 1/1.78))
     block({y:} {sigma2} {lam2}, split)
     rseed(16) dots:
     : regress y age sex bmi map tc ldl hdl tch ltg glu

For the coefficients, we use the Laplace prior with zero mean and the scale parameter that depends on the variance {sigma2} and squared penalization hyperparameter {lam2}. The authors suggest using the gamma prior with shape 1 and rate 1.78 (or scale 1/1.78) as the prior for {lam2}. For the intercept {y:_cons} and variance, we use vague priors: normal(0, 1e6) and jeffreys. To improve sampling efficiency for these data, we sample all parameters in separate blocks.

Let's now run the model.

. bayes, prior({y:age sex bmi map tc ldl hdl tch ltg glu}, laplace(0, (sqrt({sigma2}/{lam2}))))
     prior({sigma2},  jeffreys)
     prior({y:_cons}, normal(0, 1e6))
     prior({lam2=1},  gamma(1, 1/1.78))
     block({y:} {sigma2} {lam2}, split)
     rseed(16) dots
     : regress y age sex bmi map tc ldl hdl tch ltg glu

Burn-in 2500 aaaaaaaaa1000aaaaaaaaa2000aaaaa done
Simulation 10000 .........1000.........2000.........3000.........4000.........5
> 000.........6000.........7000.........8000.........9000.........10000 done

Model summary
                                                                              
Likelihood:
  y ~ regress(xb_y,{sigma2})

Priors:
  {y:age sex bmi map tc ldl hdl tch ltg glu} ~ laplace(0,)                 (1)
                                   {y:_cons} ~ normal(0,1e6)               (1)
                                    {sigma2} ~ jeffreys

Hyperprior:
  {lam2} ~ gamma(1,1/1.78)

Expression:
  expr1 : sqrt({sigma2}/{lam2})
                                                                              
(1) Parameters are elements of the linear form xb_y.

Bayesian linear regression                       MCMC iterations  =     12,500
Random-walk Metropolis-Hastings sampling         Burn-in          =      2,500
                                                 MCMC sample size =     10,000
                                                 Number of obs    =        442
                                                 Acceptance rate  =      .4379
                                                 Efficiency:  min =      .0152
                                                              avg =      .1025
Log marginal-likelihood = -2415.7171                          max =      .2299



                                                              Equal-tailed    
                    Mean   Std. dev.     MCSE     Median  [95% cred. interval]

y                                                                             
         age   -2.478525   52.97851   1.26623  -2.593401  -108.3303   104.2442
         sex   -209.4461   61.21979   1.70006  -211.1479  -330.2515  -85.52568
         bmi    522.1367   66.76557    1.8115   520.6348   393.4224   656.4993
         map    304.1617   65.26244   1.77912   306.1749   175.0365   432.3554
          tc   -172.2847   157.5097   12.7739  -159.4816  -523.0447    110.226
         ldl    1.304382   128.3598   9.96343  -7.796492  -251.2571   298.4382
         hdl   -158.8146   112.6562   6.82563  -158.1347  -378.4126   48.93263
         tch    91.27437   111.8483   6.06667   86.32462   -114.675   319.0824
         ltg    515.5167   94.06607   5.83902   509.9952   342.9893    715.739
         glu    67.94583   62.86024   1.69235   66.11433   -51.1174   197.7894
       _cons    152.0964   2.545592   .053095   152.0963   146.9166   157.1342

      sigma2    2961.246   207.0183   4.79372   2949.282   2587.023   3389.206
        lam2    .0889046    .055257   .001899   .0769573    .020454    .229755

The coefficient estimates above are somewhat similar to the penalized estimates of the variables selected by lasso.

We can compare the estimates of the penalized parameters:

. bayesstats summary (lambda:sqrt({lam2}))

Posterior summary statistics                      MCMC sample size =    10,000

      lambda : sqrt({lam2})



                                                              Equal-tailed    
                    Mean   Std. dev.     MCSE     Median  [95% cred. interval]

      lambda     .285446   .0861738   .003292   .2774119   .1430174   .4793277

Bayesian lasso estimated lambda to be 0.29, which is smaller than lasso's 0.57, but this estimate is not directly comparable because lasso standardizes covariates to have the scale of 1 during the computation.

Although Bayesian lasso does not automatically select variables, we can use some of Bayesian postestimation tools to help us select the variables. For instance, we can use bayesstats summary to compute the posterior probabilities that each coefficient is less than 0. If the probability is close to 0.5, then the coefficient estimate is centered around 0, and the corresponding variable should not be selected in the model.

. bayesstats summary (age:{y:age}<0) (sex:{y:sex}<0) (bmi:{y:bmi}<0)
     (map:{y:map}<0) (tc:{y:tc}<0) (ldl:{y:ldl}<0) (hdl:{y:hdl}<0)
     (tch:{y:tch}<0) (ltg:{y:ltg}<0) (glu:{y:glu}<0)


Posterior summary statistics                      MCMC sample size =    10,000

         age : {y:age}<0
         sex : {y:sex}<0
         bmi : {y:bmi}<0
         map : {y:map}<0
          tc : {y:tc}<0
         ldl : {y:ldl}<0
         hdl : {y:hdl}<0
         tch : {y:tch}<0
         ltg : {y:ltg}<0
         glu : {y:glu}<0



                                                              Equal-tailed    
                    Mean   Std. dev.     MCSE     Median  [95% cred. interval]

         age       .5277   .4992571   .011353          1          0          1
         sex       .9997   .0173188   .000224          1          1          1
         bmi           0          0         0          0          0          0
         map           0          0         0          0          0          0
          tc       .8815   .3232154   .018971          1          0          1
         ldl       .5301   .4991181   .029122          1          0          1
         hdl       .9226   .2672384    .01198          1          0          1
         tch       .1992   .3994187   .016507          0          0          1
         ltg       .1992   .3994187   .016507          0          0          1
         glu       .1417   .3487596   .008577          0          0          1

There are two variables, age and ldl, for which the probability that their coefficients are less than 0 is close to 0.5. So, we would drop these two variables from the model. Thus, we arrive at the same set of variables as selected earlier by lasso.

References

Efron, B., T. Hastie, I. Johnstone, and R. Tibshirani 2004. Least angle regression. The Annals of Statistics 32: 407–499.

Park, T.,and G. Casella. 2008. Bayesian lasso. Journal of the American Statistical Association 103: 681–686.

Tell me more

See all Bayesian features and the Stata Bayesian Analysis Reference Manual.

Products

New in Stata 19

Why Stata

All features

Disciplines

Stata/MP

StataNow

Order Stata

Purchase

Order Stata

Bookstore

Stata Press

Stata Journal

Gift Shop

Learn

Free webinars

NetCourses

Classroom and web training

Organizational training

Video tutorials

Third-party courses

Web resources

Teaching with Stata

Support

Training

Video tutorials

FAQs

Statalist: The Stata Forum

Resources

Technical support

Customer service

Alerts

Company

News and events

Customer service

Careers

We use cookies

We use cookies to ensure that we give you the best experience on our website—to enhance site navigation, to analyze usage, and to assist in our marketing efforts. By continuing to use our site, you consent to the storing of cookies on your device and agree to delivery of content, including web fonts and JavaScript, from third party web services.

Cookie Settings

Privacy policy

Last updated: 16 November 2022

StataCorp LLC (StataCorp) strives to provide our users with exceptional products and services. To do so, we must collect personal information from you. This information is necessary to conduct business with our existing and potential customers. We collect and use this information only where we may legally do so. This policy explains what personal information we collect, how we use it, and what rights you have to that information.

Required cookies

Advertising cookies

Required cookies

These cookies are essential for our website to function and do not store any personally identifiable information. These cookies cannot be disabled.
Advertising and performance cookies

This website uses cookies to provide you with a better user experience. A cookie is a small piece of data our website stores on a site visitor's hard drive and accesses each time you visit so we can improve your access to our site, better understand how you use our site, and serve you content that may be of interest to you. For instance, we store a cookie when you log in to our shopping cart so that we can maintain your shopping cart should you not complete checkout. These cookies do not directly store your personal information, but they do support the ability to uniquely identify your internet browser and device.

Please note: Clearing your browser cookies at any time will undo preferences saved here. The option selected here will apply only to the device you are currently using.

Accept Cookies


		No. of Out-of- CV mean
		nonzero sample prediction
ID		Description lambda coef. R-squared error

1		first lambda 45.16003 0 0.0008 5934.909
47		lambda before .6254151 8 0.4896 3026.453
* 48		selected lambda .5698549 8 0.4896 3026.42
49		lambda after .5192306 8 0.4896 3026.567
83		last lambda .0219595 10 0.4891 3029.735


		active

sex		-213.4102
bmi		524.8137
map		306.6641
tc		-154.2327
hdl		-184.5127
tch		58.66197
ltg		523.1158
glu		60.12939
_cons		152.1335


		Equal-tailed
		Mean Std. dev. MCSE Median [95% cred. interval]

y
age		-2.478525 52.97851 1.26623 -2.593401 -108.3303 104.2442
sex		-209.4461 61.21979 1.70006 -211.1479 -330.2515 -85.52568
bmi		522.1367 66.76557 1.8115 520.6348 393.4224 656.4993
map		304.1617 65.26244 1.77912 306.1749 175.0365 432.3554
tc		-172.2847 157.5097 12.7739 -159.4816 -523.0447 110.226
ldl		1.304382 128.3598 9.96343 -7.796492 -251.2571 298.4382
hdl		-158.8146 112.6562 6.82563 -158.1347 -378.4126 48.93263
tch		91.27437 111.8483 6.06667 86.32462 -114.675 319.0824
ltg		515.5167 94.06607 5.83902 509.9952 342.9893 715.739
glu		67.94583 62.86024 1.69235 66.11433 -51.1174 197.7894
_cons		152.0964 2.545592 .053095 152.0963 146.9166 157.1342

sigma2		2961.246 207.0183 4.79372 2949.282 2587.023 3389.206
lam2		.0889046 .055257 .001899 .0769573 .020454 .229755


		Equal-tailed
		Mean Std. dev. MCSE Median [95% cred. interval]

lambda		.285446 .0861738 .003292 .2774119 .1430174 .4793277


		Equal-tailed
		Mean Std. dev. MCSE Median [95% cred. interval]

age		.5277 .4992571 .011353 1 0 1
sex		.9997 .0173188 .000224 1 1 1
bmi		0 0 0 0 0 0
map		0 0 0 0 0 0
tc		.8815 .3232154 .018971 1 0 1
ldl		.5301 .4991181 .029122 1 0 1
hdl		.9226 .2672384 .01198 1 0 1
tch		.1992 .3994187 .016507 0 0 1
ltg		.1992 .3994187 .016507 0 0 1
glu		.1417 .3487596 .008577 0 0 1