»  Home »  Stata News »  Vol 33 No 4 »  User's corner

## User's corner: Machine learning

Interested in machine learning? Lasso? Support vector machines? Boosted regression? Other algorithms? Stata's user community has developed packages for a variety of machine learning techniques.

The list below groups the machine learning packages by the type of algorithm they provide. To learn more, click on the name of the package or command. You can also open Stata and type the corresponding ssc describe or net command to read more about the command and learn how to install it.

### Lasso, elastic net regression, and ridge regression:

LASSOPACK is a suite of programs developed by Achim Ahrens, Christian Hansen, and Mark Schaffer that includes the lasso2, cvlasso, and rlasso commands. These commands provide features including lasso, square-root lasso, elastic net, ridge regression, adaptive lasso estimation, and cross-validation.

. ssc describe lassopack


PDSLASSO was also developed by Achim Ahrens, Christian Hansen, and Mark Schaffer. It includes the psdlasso and ivlasso commands for estimation and causal inference in models with endogeneity. Two estimation methods are provided—post-double-selection and post-regularization.

. ssc describe pdslasso


elasticregress was developed by Wilbur Townsend. This command performs elastic net-regularized regression, including lasso and ridge regression.

. ssc describe elasticregress


sivreg, developed by Helmut Farbmacher, performs adaptive lasso for a linear instumental-variables regression with some invalid instruments.

. ssc describe sivreg


krls was developed by Jeremy Ferwerda, Jens Hainmueller, and Chad Hazlett. This command performs kernel-based regularized least squares. It is described in detail here.

. ssc describe krls


plogit, developed by Tony Brady and Gareth Ambler, performs penalized logistic regression, including lasso.

. net describe plogit, from(http://www.homepages.ucl.ac.uk/~ucakgam/stata)


lars was developed by Adrian Mander and provides the least-angle regression (LARS) model-building algorithm.

. ssc describe lars


### Support vector machines:

svmachines was developed by Nick Guenther and Matthias Schonlau. This command provides the support vector machine (SVM) algorithm and can be applied to continuous, binary, and categorical outcomes.

. net sj 16-4  st0461


### Boosted regression:

boost, a plugin developed by Matthias Schonlau, performs boosted regression.

. net sj 12-2 st0087_1


### Regression trees:

chaid was developed by Joseph Luchman. This command provides the chi-square automatic interaction detection (CHAID) and exhaustive CHAID algorithms.

. ssc describe chaid


cart, developed by Wim van Putten, performs classification and regression tree analysis for failure-time data.

. ssc describe cart


### Random decision forests:

chaidforest, developed by Joseph Luchman, conducts random decision forests ensemble classification using CHAID as the base learner.

. ssc describe chaidforest


### Latent Dirichlet allocation:

ldagibbs was developed by Carlo Schwarz. This command provides a Gibbs sampling algorithm for latent Dirichlet allocation (LDA) for clustering of text strings.

. net sj 18-1 st0515


And don't forget Stata's official commands. For instance, the commands for linear regression, logistic regression, discriminant analysis, cluster analysis, and principal component analysis can also be used as tools for machine learning.