User's corner: Machine learning
Interested in machine learning? Lasso? Support vector machines? Boosted regression? Other algorithms? Stata's user community has developed packages for a variety of machine learning techniques.
The list below groups the machine learning packages by the type of algorithm they provide. To learn more, click on the name of the package or command. You can also open Stata and type the corresponding ssc describe or net command to read more about the command and learn how to install it.
Lasso, elastic net regression, and ridge regression:
LASSOPACK is a suite of programs developed by Achim Ahrens, Christian Hansen, and Mark Schaffer that includes the lasso2, cvlasso, and rlasso commands. These commands provide features including lasso, square-root lasso, elastic net, ridge regression, adaptive lasso estimation, and cross-validation.
. ssc describe lassopack
PDSLASSO was also developed by Achim Ahrens, Christian Hansen, and Mark Schaffer. It includes the psdlasso and ivlasso commands for estimation and causal inference in models with endogeneity. Two estimation methods are provided—post-double-selection and post-regularization.
. ssc describe pdslasso
elasticregress was developed by Wilbur Townsend. This command performs elastic net-regularized regression, including lasso and ridge regression.
. ssc describe elasticregress
sivreg, developed by Helmut Farbmacher, performs adaptive lasso for a linear instumental-variables regression with some invalid instruments.
. ssc describe sivreg
. ssc describe krls
plogit, developed by Tony Brady and Gareth Ambler, performs penalized logistic regression, including lasso.
. net describe plogit, from(http://www.homepages.ucl.ac.uk/~ucakgam/stata)
lars was developed by Adrian Mander and provides the least-angle regression (LARS) model-building algorithm.
. ssc describe lars
Support vector machines:
svmachines was developed by Nick Guenther and Matthias Schonlau. This command provides the support vector machine (SVM) algorithm and can be applied to continuous, binary, and categorical outcomes.
. net sj 16-4 st0461
boost, a plugin developed by Matthias Schonlau, performs boosted regression.
. net sj 12-2 st0087_1
chaid was developed by Joseph Luchman. This command provides the chi-square automatic interaction detection (CHAID) and exhaustive CHAID algorithms.
. ssc describe chaid
cart, developed by Wim van Putten, performs classification and regression tree analysis for failure-time data.
. ssc describe cart
Random decision forests:
chaidforest, developed by Joseph Luchman, conducts random decision forests ensemble classification using CHAID as the base learner.
. ssc describe chaidforest
Latent Dirichlet allocation:
ldagibbs was developed by Carlo Schwarz. This command provides a Gibbs sampling algorithm for latent Dirichlet allocation (LDA) for clustering of text strings.
. net sj 18-1 st0515
And don't forget Stata's official commands. For instance, the commands for linear regression, logistic regression, discriminant analysis, cluster analysis, and principal component analysis can also be used as tools for machine learning.