The Swiss Stata Users Group Meeting was held on 25 October 2018 at ETH Zürich. There was also an optional workshop the same day. You can view the program and presentation slides below.
Prediction, model selection, and causal inference with regularized regression: Introducing two Stata packages—lassopack and pdslasso
Abstract: The field of machine learning is attracting increasing attention among social scientists and economists. At the same time, Stata offers only a limited set of machine learning tools to date. This presentation introduces two Stata packages, lassopack and pdslasso, that implement regularized regression methods, including the lasso for Stata. The packages include features intended for prediction, model selection, and causal inference and are thus applicable in many settings. The commands allow for high-dimensional models, where the number of regressors may be large or even exceed the number of observations under the assumption of sparsity. lassopack implements lasso, square-root lasso, elastic net, ridge regression, adaptive lasso, and postestimation OLS. These methods rely on tuning parameters, which determine the degree and type of penalization. lassopack supports three approaches for selecting these tuning parameters: information criteria (implemented in lasso2), K-fold and h-step ahead rolling cross-validation (cvlasso), and theory-driven penalization (rlasso) due to Belloni et al. (2012). pdslasso offers methods to facilitate causal inference in structural models. Specifically, pdslasso implements methods for selecting control variables (pdslasso) and instruments (ivlasso) from a large set of variables in a setting where the researcher is interested in estimating the causal impact of one or more (possibly endogenous) causal variables of interest.
Economic and Social Research Institute, Dublin
Customizing Stata graphs made easy
Abstract: The overall look of Stata's graphs is determined by so-called scheme files. Scheme files are system components; that is, they are part of the local Stata installation. In this presentation, I will argue that style settings deviating from default schemes should be part of the script producing the graphs rather than being kept in separate scheme files, and I will present software that supports such practice. In particular, I will present a command called grstyle that allows users to quickly change the overall look of graphs without having to fiddle around with external scheme files. I will also present a command called colorpalette that provides a wide variety of color schemes for use in Stata graphics.
Estimating the average causal effect on an ordinal outcome of an endogenously assigned treatment from an endogenously selected sample
Abstract: This presentation discusses the average causal effect (ACE) of an endogenous binary treatment on an ordinal outcome when the sample is subject to endogenous selection. I show how to estimate the ACE using an extended regression model (ERM) command in Stata. I illustrate how to do regression adjustment in Stata and discuss standard errors for sample-averaged treatment effects and population-averaged treatment effects.
Inference with arbitrary clustering
Abstract: In recent years, we have witnessed a tremendous surge of empirical analyses that use geospatial data or data with a network structure. Inference in these settings is challenging because unobserved errors can be correlated in space along a network or over time and because the standard approaches to conducting inference are not compelling. We developed an estimator for the variance–covariance matrix (VCV) of OLS and IV estimates that allows for arbitrary dependence between observational units. Arbitrary here refers to the fact that there are no restrictions in the way units could be correlated with each other in space and time: this estimator can account for indirect links in the cross-sectional dependence, time dependence, and alteration of the correlation structure over time. Our estimator builds on the seminal insight by White (1980), who shows that a sandwich type VCV can be estimated by constructing a consistent estimator of the VCV of the parameters. Specifically, the estimator uses estimated regression errors and knowledge on the clustering structure to reconstruct estimates of the unknown elements of the sandwich formula. We also provide the community with a companion statistical package: our acreg command enables users to adjust OLS and 2SLS coeficients' standard errors, accounting for arbitrary dependence. We conduct a Monte Carlo study to illustrate how correlation across units within an arbitrary cluster, for example, spatially close units or friends in a network, affect the rejection rate of a null hypothesis if such correlation is not accounted for while estimating the standard errors. We implement simulations using real-life data to construct arbitrary clusters, for example, geocoded data on U.S. towns and counties for the spatial setting and authorship connections data for the network setting. We construct a setting where IV with cluster–robust standard errors rejects the null of no effect in about 20% of all cases when the significance level of the test is set at 5%. Conventional inference does not improve as the sample size increases, suggesting that the conventional approach produces inconsistent estimates of the variance–covariance matrix. Adopting the arbitrary clustering estimator, we find that the null rejection rate is about 10% for small samples and converges quickly toward the true significance level of 5% as the sample size increases. This pattern suggests that the arbitrary clustering correction produces consistent estimates of the VCV, enabling applied econometricians to conduct robust inference in the presence of arbitrary clustering.
University of Lausanne
Estimating long-run effects in models with cross-sectional dependence using xtdcce2
Abstract: Using the Stata community-contributed command xtdcce2, I show how to estimate long-run coefficients in a dynamic panel with heterogeneous coefficients and common factors and a large number of observations over cross-sectional units and time periods. The common factors cause cross-sectional dependence, which is approximated by cross-sectional averages. Heterogeneity of the coefficients is accounted by taking the unweighted averages of the unit-specific estimates. Following Chudik et al. (2016), I consider three different models to estimate long-run coefficients: a simple dynamic model (CS-DL), an error-correction model, and an ARDL model (CS-ARDL). I explain how to estimate all three models in Stata using xtdcce2. Further emphasis is put on estimating the standard errors of the long-run coefficients. Estimated standard errors obtained by the delta method and bootstrapped standard errors are compared.
Chudik, A., K. Mohaddes, M. H. Pesaran, and M. Raissi. 2016. Long-run effects in large heterogeneous panel data models with cross-sectionally correlated errors. Essays in Honor of Aman Ullah. Advances in Econometrics 36: 85–135.
Wishes and grumbles
Abstract: Stata developers present will carefully and cautiously consider wishes and grumbles from Stata users in the audience. Questions, and possibly answers, may concern reports of present bugs and limitations or requests for new features in future releases of the software.
Many estimators in statistics, econometrics, and biostatistics are cast as multi-step estimators. Multi-step estimators produce consistent point estimates, but the standard errors must be corrected. This problem is so common that it even emerges when estimating population averaged effects from a regression with powers or interactions. This workshop introduces the solution of stacked moment equations, which is a special case of the generalized method of moments (GMM), and shows how to implement this solution using the gmm command in Stata.
This workshop also includes an introduction to Monte Carlo simulations. In addition to describing the mechanics of running a Monte Carlo in Stata, it discusses how to use Monte Carlo simulations to illustrate a theoretical point.
The workshop is included in your meeting registration.