In the spotlight: Lasso

You have lots of data. Lots of variables. Maybe even more variables than observations. Perhaps you have genetic data and want to predict a certain type of cancer. Perhaps you have demographic data and want to predict employment status. Or perhaps you have data recording words used in restaurant reviews and want to predict health inspection scores. When you know some variables will be helpful in predicting the outcome but you don't know which ones, lasso can help.

With Stata 16's new lasso features, you can sift through many potential variables and extract ones that have the ability to predict outcomes. With a command such as

. lasso linear y x1-x1000

you can select from among 1000 potential variables. Or if your outcome is binary, you could type

. lasso logit y x1-x1000

If you want to select variables in a training sample and evaluate performance in a validation sample, you add just a few more commands.

. splitsample, generate(sample)
. lasso linear y x1-x1000 if sample==1
. lassogof if sample==2

In our blog post An introduction to the lasso in Stata, we demonstrate how to use various techniques such as cross-validation and adaptive lasso to select variables and how to evaluate their predictive abilities.

Sometimes, you will want to go beyond variable selection and prediction. You might want standard errors, tests, and confidence intervals for coefficients on some variables of interest. When inference is the goal, Stata 16 also provides a suite of lasso commands that provide proper inference for a subset of variables while using lasso methods to select controls from many other variables. For instance, you can use a method called double selection and perform inference for x1 and x2 by typing

. dsregress y x1 x2, controls(x3-x1000)

In our blog post Using lasso for inference in high-dimensional models, we give an overview of inference with lasso and walk you through examples of three estimators available in Stata 16.

— David Drukker
Executive Director of Econometrics

— Di Liu
Senior Econometrician

We use cookies

We use cookies to ensure that we give you the best experience on our website—to enhance site navigation, to analyze usage, and to assist in our marketing efforts. By continuing to use our site, you consent to the storing of cookies on your device and agree to delivery of content, including web fonts and JavaScript, from third party web services.

Cookie Settings

Last updated: 16 November 2022

StataCorp LLC (StataCorp) strives to provide our users with exceptional products and services. To do so, we must collect personal information from you. This information is necessary to conduct business with our existing and potential customers. We collect and use this information only where we may legally do so. This policy explains what personal information we collect, how we use it, and what rights you have to that information.

Advertising and performance cookies

This website uses cookies to provide you with a better user experience. A cookie is a small piece of data our website stores on a site visitor's hard drive and accesses each time you visit so we can improve your access to our site, better understand how you use our site, and serve you content that may be of interest to you. For instance, we store a cookie when you log in to our shopping cart so that we can maintain your shopping cart should you not complete checkout. These cookies do not directly store your personal information, but they do support the ability to uniquely identify your internet browser and device.

Please note: Clearing your browser cookies at any time will undo preferences saved here. The option selected here will apply only to the device you are currently using.