Treatment-effects estimation using lasso

Order

Watch video demo

<- See Stata's other features

Highlights

Estimate treatment effects with high-dimensional controls

High-dimensional controls in the outcome model
High-dimensional controls in the treatment model

Flexible model specification

Outcome model can be linear, logit, probit, or poisson
Treatment assignment model can be logit or probit

Different measures of treatment effects

ATE: average treatment effects
ATET: average treatment effect on the treated
POM: potential-outcome mean

Robust estimation

Double robustness: only one of the models needs to be correctly specified
Neyman orthogonality: guard against model-selection mistakes made by lasso

Double machine learning

Cross-fitting and resampling

See more lasso features

You use treatment-effects estimators to draw causal inferences from observational data. Perhaps you want to estimate the effect of a drug regimen on blood pressure, the effect of a surgical procedure on mobility, the effect of a training program on employment, or the effect of an ad campaign on sales.

You use lasso inferential estimators when you are interested in inference on a few covariates while controlling for many other potential covariates. (And when we say many, we mean hundreds, thousands, or more!)

You can now use these estimators simultaneously. With the telasso command, you can estimate treatment effects while controlling for many potential covariates.

For example, you can type

. telasso (y1 x1-x100) (treat w1-w100)

to estimate the effect of the binary treatment treat on the continuous outcome y1 while controlling for predictors x1 through x100 in the outcome model and for w1 through w100 in the treatment model. The obtained estimates benefit from robustness properties of both the treatment-effects estimators and lasso.

With telasso, you get everything you expect from treatment effects and from lasso. You can estimate the average treatment effect, the average treatment effect on the treated, and the potential-outcome means. You can model continuous, binary, and count outcomes and choose between a logit or probit treatment model. And for selection of controls, you can choose between lasso or square-root lasso estimation and choose from several selection methods, such as BIC and cross-validation.

Let's see it work

We would like to compare two types of lung transplants: bilateral lung transplant (BLT) and single lung transplant (SLT). BLT is usually associated with a higher death rate in the short term after the operation but with a more significant improvement in the quality of life than SLT. As a result, for patients who need to decide between these two treatment options, knowing the effect of BLT (versus SLT) on life quality is essential. Therefore, we want to estimate the effect of the treatment transtype on the outcome fev1p. This outcome represents the percentage of forced expiratory volume in one second (FEV1) that the patient has relative to a healthy person.

Our data include 29 variables recording characteristics of the patients and donors. We use these variables and the interactions between them as controls in our model. It would be tedious to type these variable names one by one to distinguish between continuous and categorical variables. vl is a suite of commands that simplifies this process.

The following code creates the control variable list and stores it in the global macro $allvars.

. quietly vl set

. vl create cvars = vlcontinuous - (fev1p)
note: $cvars initialized with 12 variables.

. vl create fvars = vlcategorical - (transtype)
note: $fvars initialized with 17 variables.

. vl sub allvars = c.cvars i.fvars c.cvars#i.fvars

Now we are ready to use telasso to estimate the average treatment effects. We assume a linear outcome model and a logit treatment model, the defaults. We type

. telasso (fev1p $allvars) (transtype $allvars)

Estimating lasso for outcome fev1p if tran~e = 0 using plugin method ...
Estimating lasso for outcome fev1p if tran~e = 1 using plugin method ...
Estimating lasso for treatment tran~e using plugin method ...
Estimating ATE ...

Treatment-effects lasso estimation    Number of observations      =        937
Outcome model:   linear               Number of controls          =        454
Treatment model: logit                Number of selected controls =          8



                             Robust
       fev1p   Coefficient  std. err.      z    P>|z|     [95% conf. interval]

ATE           
   transtype  
       (BLT   
         vs   
       SLT)      37.51841   .1606703   233.51   0.000     37.20351    37.83332

POmean        
   transtype  
        SLT       46.4938   .2021582   229.99   0.000     46.09757    46.89002

If all the patients were to choose a BLT, the FEV1% is expected to be 38 percentage points higher than the average of 46% expected if all patients were to choose an SLT. Among the 454 control variables, telasso selects only 8 of them.

It is common to estimate the average treatment effect to determine the effect on those who actually received the treatment. To estimate this value, we add the atet option.

. telasso (fev1p $allvars) (transtype $allvars), atet

Estimating lasso for outcome fev1p if tran~e = 0 using plugin method ...
Estimating lasso for outcome fev1p if tran~e = 1 using plugin method ...
Estimating lasso for treatment tran~e using plugin method ...
Estimating ATET ...

Treatment-effects lasso estimation    Number of observations      =        937
Outcome model:   linear               Number of controls          =        454
Treatment model: logit                Number of selected controls =          8



                             Robust
       fev1p   Coefficient  std. err.      z    P>|z|     [95% conf. interval]

ATET          
   transtype  
       (BLT   
         vs   
       SLT)      35.78157   .1831478   195.37   0.000     35.42261    36.14053

POmean        
   transtype  
        SLT      43.35214   1.268976    34.16   0.000     40.86499    45.83929

For the patients who have a BLT, we expect the average FEV1% to be 36 percentage points higher than if all of them choose an SLT.

The estimates that we obtained above relied on a key assumption of lasso, the sparsity assumption, which requires that only a small number of the potential covariates are in the "true" model. We can use a double machine learning technique to allow for more covariates in the true model. To do this, we add the xfold(5) option to split the sample into five groups and perform cross-fitting and add the resample(3) option to repeat the cross-fitting procedure with three samples.

To guarantee that we can later reproduce the estimation results, we also set the random-number seed. We type

. set seed 12345671

. telasso (fev1p $allvars) (transtype $allvars), xfolds(5) resample(3) nolog

Treatment-effects lasso estimation    Number of observations       =       937
                                      Number of controls           =       454
                                      Number of selected controls  =        16
Outcome model:   linear               Number of folds in cross-fit =         5
Treatment model: logit                Number of resamples          =         3



                             Robust
       fev1p   Coefficient  std. err.      z    P>|z|     [95% conf. interval]

ATE           
   transtype  
       (BLT   
         vs   
       SLT)      37.52837   .1683194   222.96   0.000     37.19847    37.85827

POmean        
   transtype  
        SLT       46.4941   .2040454   227.86   0.000     46.09418    46.89402

The estimated treatment effect is very similar to the one reported by the first telasso command, but the selected model included 16 controls instead of 8. The similarity of the estimates across the different specifications suggests that our first model did not suffer from a violation of the sparsity assumption.

Tell me more

See more examples and information on telasso in [CAUSAL] telasso.

Learn more about treatment effects in the Stata Causal Inference and Treatment-Effects Estimation Reference Manual.

Learn more about lasso in the Stata Lasso Reference Manual.

Products

New in Stata 19

Why Stata

All features

Disciplines

Stata/MP

StataNow

Order Stata

Purchase

Order Stata

Bookstore

Stata Press

Stata Journal

Gift Shop

Learn

Free webinars

NetCourses

Classroom and web training

Organizational training

Video tutorials

Third-party courses

Web resources

Teaching with Stata

Support

Training

Video tutorials

FAQs

Statalist: The Stata Forum

Resources

Technical support

Customer service

Alerts

Company

News and events

Customer service

Careers

We use cookies

We use cookies to ensure that we give you the best experience on our website—to enhance site navigation, to analyze usage, and to assist in our marketing efforts. By continuing to use our site, you consent to the storing of cookies on your device and agree to delivery of content, including web fonts and JavaScript, from third party web services.

Cookie Settings

Privacy policy

Last updated: 16 November 2022

StataCorp LLC (StataCorp) strives to provide our users with exceptional products and services. To do so, we must collect personal information from you. This information is necessary to conduct business with our existing and potential customers. We collect and use this information only where we may legally do so. This policy explains what personal information we collect, how we use it, and what rights you have to that information.

Required cookies

Advertising cookies

Required cookies

These cookies are essential for our website to function and do not store any personally identifiable information. These cookies cannot be disabled.
Advertising and performance cookies

This website uses cookies to provide you with a better user experience. A cookie is a small piece of data our website stores on a site visitor's hard drive and accesses each time you visit so we can improve your access to our site, better understand how you use our site, and serve you content that may be of interest to you. For instance, we store a cookie when you log in to our shopping cart so that we can maintain your shopping cart should you not complete checkout. These cookies do not directly store your personal information, but they do support the ability to uniquely identify your internet browser and device.

Please note: Clearing your browser cookies at any time will undo preferences saved here. The option selected here will apply only to the device you are currently using.

Accept Cookies


		Robust
fev1p		Coefficient std. err. z P>\|z\| [95% conf. interval]

ATE
transtype
(BLT
vs
SLT)		37.51841 .1606703 233.51 0.000 37.20351 37.83332

POmean
transtype
SLT		46.4938 .2021582 229.99 0.000 46.09757 46.89002