»  Home »  Stata Conferences »  2022 Stata Conference DC

# 2022 Stata Conference

## The Stata Conference heads to DC

The 2022 Stata Conference will be held on 4–5 August 2022 in Washington, DC, at the Marriott Marquis.

Organized by StataCorp, the annual Stata Conference is an exceptional opportunity to network with researchers from across all disciplines, engage with StataCorp's developers, and learn new and exciting applications of Stata.

Don't miss your chance to snag a photo or two ... or twenty, with myriad landmarks, and enjoy two days of networking and Stata exploration with the worldwide Stata community.

#Stata2022

## Schedule & Agenda

### Day 2

#### gtsheckman: Generalized two-step Heckman estimator

##### Alyssa Carlson, University of Missouri

In this presentation, I introduce the gtsheckman command, which estimates a generalized two-step Heckman sample-selection estimator adjusted for heteroskedasticity. This estimator has been previously proposed in Carlson and Joshi (2022), where the presence of heteroskedasticity was motivated by a panel-data setting with random coefficients. The gtsheckman command offers several advantages over the heckman, twostep command, including robust inference, a more general control function specification, and incorporating heteroskedasticity.

#### Quantile regression in Stata: Performance, precision, and power

##### Morten Wang Fagerland, Oslo University Hospital

Quantile regression (command qreg) estimates quantiles of the outcome variable, conditional on the values of the independent variables, with median regression as the default form. Quantile regression can be used for several purposes: to estimate medians instead of means as a measure of central tendency—for instance, when data are markedly skewed; to estimate a particular quantile that may be of interest, such as the 10th quantile of birthweight to find predictors of low birthweight; or to study how the effect of independent variables vary over different quantiles of the dependent variable. Specifying the variance–covariance estimator for quantile regression is not straightforward. qreg offers both independent and identically distributed (i.i.d.) and robust estimators. The density estimation technique (DET) can be fitted, residual (i.i.d. only), or kernel. Three different bandwidth methods are available with the fitted and residual DETs, and eight kernel functions are available for the kernel DET. There is also a bootstrap option, which puts the total number of methods at 26. A natural question arises: which one to use? The aim of this presentation is to explore the performance of the methods and to arrive at some overall recommendations for which methods to use.

#### bivpoisson: A Stata command estimating seemingly unrelated count data

##### Abbie Zhang, Henan University

We give a Stata command, bivpoisson, that allows efficient estimation of seemingly unrelated count data. This command is an extension and improvement upon sureg, which is a linear, seemingly unrelated regression command based on Zellner (1963). This is the first command in Stata that allows for user-specified cross-equation correlation structure in the context of a nonlinear system of equations. This package can be widely used in many count data such as accidents, RNA sequences, and healthcare. The theoretical advantage of this model is the efficiency gain. When we encounter count-valued correlated dependent variables, linear system of equation estimation is no longer efficient. See details of the simulation study for efficiency comparison in Terza and Zhang (2022, Working paper). Maximum likelihood estimation is used for deep-parameter estimation and causal inference, and these numerical tasks are implemented in Stata/Mata with the two-dimensional Gauss–Legendre quadrature integration algorithm. See Terza and Zhang (2020 Stata Conference) and Kazeminezhad, Terza, and Zhang (2021 Stata Conference) for the details of the algorithm and validation. The deep parameters estimated by this package include the point estimate and standard errors of (1) a vector of coefficient beta for the exponentiated linear-index; (2) the correlation coefficient parameter rho for the cross-equation heterogeneity term, which is multivariate normally distributed. A postestimation command in average treatment-effect estimation (ATE) will be developed in the later version of this command, as will model-specification tests. Other types of count marginal distributions such as Conway–Maxwell–Poisson will also be added in the future version as options for dispersion flexibility.

#### rbicopula: Recursive bivariate copula estimation and decomposition of marginal effects

##### Mustafa Coban, Institute for Employment Research (IAB)

This presentation describes a new Stata command, rbicopula, for fitting copula-based maximum-likelihood estimation of recursive bivariate models that enable a flexible residual distribution and differ from bivariate copula or probit models in allowing the first dependent variable to appear on the right-hand side of the second dependent variable. The new command provides various copulas, allowing the user to choose a copula that best captures the dependence features of the data caused by the presence of common unobserved heterogeneity. Although the estimation of model parameters does not differ from the bivariate case, the existing user-written command bicop does not consider the structural model's recursive nature for predictions and doesn't enable margins as a postestimation command. rbicopula estimates the model parameters, computes treatment effects of the first dependent variable, and gives the marginal effects of independent variables. In addition, marginal effects can be decomposed into direct and indirect effects if covariates appear in both equations. Moreover, the postestimation commands incorporate two goodness-of-fit tests. Dependent variables of the recursive bivariate model may be binary, ordinal, or a mixture of both. I present and explain the rbicopula command and the available postestimation commands using data from the Stata website.

#### New tools to create PowerPoint presentations within Stata

##### Tim Schmidt, Discover Financial Services

Recent versions of Stata provide helpful tools to generate reproducible reports in Microsoft Word, HTML, and PDFs. However, for better or worse, Microsoft PowerPoint presentations are the most common form of communication in many business and academic settings. Therefore, many Stata users may benefit from tools to integrate Stata and PowerPoint. I introduce a suite of new Stata programs that facilitate creating PowerPoint presentations with Stata-generated content, particularly graphs. These programs take advantage of Stata version 17’s tighter integration with the Python programming language. Using this suite of programs, collectively called “Slide Deck,” Stata users can easily create PowerPoint presentations within Stata. Slide Deck encompasses two easy-to-use, original Stata classes: “deck” and “slide.” With a few simple commands, these classes enable users to create and save a deck of PowerPoint slides that incorporate Stata graphs and other output, as well as user-supplied text (i.e., title, bullet points, etc.), without ever leaving Stata.

#### Working efficiently with Stata in shared computing environments

##### Billy Buchanan, SAG Corporation

Shared computing environments are regularly used by academia, government, and industry. While effective for the organization to manage costs and upkeep of computing infrastructure, working in a shared computing environment presents both unique benefits and challenges compared with using hardware owned/operated by the researcher. This talk will provide some advice to successfully navigate challenges in working in shared computing environments. Some topics will include dealing with memory and disk/storage constraints, leveraging metadata for documentation and infrastructure, standardizing project setup and workflow, and discussing some newer community contributed tools that can maximize the efficiency of your computer consumption.

#### Treatment-effects estimation using lasso

##### Di Lu, StataCorp

You can use treatment-effects estimators to draw causal inferences from observational data. You can use lasso when you want to control for many potential covariates. With standard treatment-effects models, there is an intrinsic conflict between two required assumptions. The conditional independence assumption is likely to be satisfied with many variables in the model, while the overlap assumption is likely to be satisfied with fewer variables in the model. This presentation shows how to overcome this conflict by using Stata 17's telasso command. telasso estimates the average treatment effects with high-dimensional controls while using lasso for model selection. This estimator is robust to the model-selection mistakes. Moreover, it is doubly robust, so only one of the outcome or treatment model needs to be correctly specified.

#### Drivers of COVID-19 deaths in the United States: A two-stage modeling approach

##### Christopher Baum, Boston College

We offer a two-stage (time-series and cross-section) econometric modeling approach to examine the drivers behind the spread of COVID-19 deaths across counties in the United States. Our empirical strategy exploits the availability of two years (January, 2020 through January, 2022) of daily data on the number of confirmed deaths and cases of COVID-19 in the 3,000 U.S. counties of the 48 contiguous states and the District of Columbia. In the first stage of the analysis, we use daily time-series data on COVID-19 cases and deaths to fit mixed models of deaths against lagged confirmed cases for each county. As the resulting coefficients are county specific, they relax the homogeneity assumption that is implicit when the analysis is performed using geographically aggregated cross-section units. In the second stage of the analysis, we assume that these county estimates are a function of economic and sociodemographic factors that are taken as fixed over the course of the pandemic. Here we employ the novel one-covariate-at-a-time variable selection algorithm proposed by Chudik et al. (Econometrica, 2018) to guide the choice of regressors.

#### The effect along a year of COVID-19: The role of prior violence, social isolation, and substance use in psychological, physical, and sexual partner violence

##### Angelo Cozzubo, NORC

The consensus is that intimate partner violence (IPV) increased during the COVID-19 lockdown. However, neither the long-term effect nor the mechanisms that explain this variation have been adequately identified (Peterman and Donnell 2020), a gap that applies to the literature in Peru and worldwide. The objective of this study is to assess the long-term impact in the 11 months from the start of lockdown on IPV, differentiating the effects by type of violence (psychological, physical, and sexual) and examining three mechanisms through which these effects may appear: prior violence, substance use, and social isolation. We do so by applying an event study and exploiting the time and location of hourly calls (N = 235,555) received by the only national helpline for domestic violence in Peru (Línea 100) (from 01/2018 to 02/2021). By focusing on Peru, we were able to respond to what happened to IPV during COVID-19 for a country in a complex situation for women: high pre-COVID-19 prevalence of IPV (Bott et al. 2019a), restrictive long-lasting lockdown measures during the pandemic, and the worst performance against COVID-19 in terms of deaths per capita and loss of national gross income. The results show that IPV varied but in a nonlinear manner in the eleven months from the start of lockdown. Furthermore, psychological IPV was the one that showed the greatest increase, followed by physical IPV. Sexual IPV showed no changes. In terms of the impact mechanisms, previous history and alcohol consumption were the most important ones, with nonlinear variations over time. While nonlinearity may indicate a media regression to the mean for some cases as a sign of “new normal levels” of IPV, relationships with risk factors show an opposite situation in which IPV is still rising a year after the initial lockdown.

#### Macro-financial determinants of default probability using copula: A case study of Indonesian banks

##### Maulana Harris Muhajir, Neoma Business School

In the aftermath of the global financial crisis of 2008, macrofinancial linkages have gained more attention from policymakers as primary issues of financial system stability. A clearer understanding of probability of default (PD) drivers may help predict if a bank will default on its portfolio liabilities. This presentation develops a method to assess a bank's PD based on a multivariate copula distribution to capture nonlinear relationships between variables with complex data structures. Then, we use the generalized method of moments (GMM) to observe the relationship between PD to bank performance (bank-specific indicators) and the macroeconomic indicators. Our findings illustrate some critical links between PD and macroeconomic environments. For example, empirical evidence suggests that bank-specific indicators such as CET 1 ratio, inefficiency ratio, and deposit ratio appear to be negatively and statistically significant to a bank's PD. When we examined the structural and macroeconomic variables, we found that policy rate, real exchange rate, economic growth, and the unemployment rate may reduce the PD. We also found that central state-owned banks tend to have a higher risk than other bank groups and that regional state-owned banks in the central region have the greatest likelihood of default.

#### A workflow for data documentation using Stata

##### Luiza Cardoso de Andrade, The World Bank (Development Impact Evaluation)

This presentation introduces three commands providing new functionality for high-quality and transparent data handling. First, iecorrect uses human-readable sheets to document and implement all changes (corrections) to data points in one line of Stata code. Second, iecodebook export creates data dictionaries and includes new features for validating the structure or contents of datasets and creating replication datasets. Third, iesave replaces save with the additional features of tracking changes to datasets over time in a Git-friendly way. Altogether, these commands allow users to access data descriptions and changelogs without reviewing Stata code—and for team members to contribute to data quality control without using Stata. In addition to the commands, the presentation will discuss general challenges of documenting datasets the authorship team solved during their creation.

#### Distributional analysis using microsimulations in Stata

##### Ercio Munoz Saavedra, The World Bank

Ex-ante evaluation of the distributional effects of a macroeconomic shock is a difficult task. One approach relies on microsimulation models often combined with a macroeconomic model (e.g., a CGE model). This approach typically follows a top-down sequence where the microsimulation model takes the outputs from the macroeconomic model as given and then uses a household survey to generate changes in the data that mimic the resulting macroeconomic aggregates. For example, this approach could be used to model how changes in the level of employment and wages by industry derived from a given macroeconomic scenario (e.g., a set of climate change policies) impact poverty and inequality. This presentation compares two methods (reweighting versus modeling occupational choices) for analyzing changes in the labor market in the context of a top-down macro-micro model. I use two surveys that are more than 10 years apart to explore how these two different ways of modeling changes in the labor market using the older survey can predict what we observe in the newer survey.

#### Development research in practice: A new handbook with Stata conventions and style guidance

##### Benjamin Daniels, The World Bank (Development Impact Evaluation)

Development Research In Practice: The DIME Analytics Data Handbook is a new Stata-centric handbook for empirical researchers. It guides readers through best practices for code and data handling in research projects from inception to publication. It includes code snippets, links to a complete Stata project repository on GitHub, links to continuously updated workflows on the DIME Wiki, and the DIME Analytics Stata Style Guide, as well as a series of recorded lectures accompanying each chapter. The handbook is intended as a complete introduction to modern reproducible code and data work. It can be used as a training manual for new staff; a textbook companion to an undergraduate or graduate-level empirical methods course; or a desk reference for practitioners at any level. In addition to the paperback, a free ebook and PDF versions are available online. In this presentation, the authors will discuss the reasons for publishing the handbook, focusing on the need for Stata practitioners to improve standardization across projects. With the continued rise of research centers and labs in the space, nonstandardized and idiosyncratic approaches slow down learning and impair collaboration. This handbook and discussion will provide a starting point for Stata users worldwide.

#### Effects of school infrastructure on student enrollments across India

##### Smit Ghelani, IIM Kozhikode

Education plays a pivotal role in socio-economic advancement of the country. India has been focusing energy to provide minimum education and include all strata of the population of the country through several schemes. This presentation aims to analyze the effect of infrastructure development of schools in India on enrollment of students across classes. Districtwise secondary data collected covering all schools across India contain profiles of schools, facilities provided by schools, teachers, and social and age data on students’ enrollment. The data are provided by the Department of School Education & Literacy, Ministry of Education, India, through the Unified District Information System for Education (UDISE).

#### magictable-Produce complex tables geared for export to preformatted tables in Excel

##### Methode Tuyisenge, National Institute of Statistics of Rwanda

The reporting of statistical data from various sources, whether censuses or surveys, requires the production of quality and well-formatted tables that are attractive to users. Stata has been challenged by users about its capability to produce customizable tables like other software such as SPSS. The magictable command was built by an expert statistician from the National Institute of Statistics of Rwanda (NISR) to export results directly in tables preformatted in Excel. This command was built specifically to report results from the Integrated Household Living Conditions Survey (EICV) conducted by NISR, but it can be modified to apply to other surveys. The magictable command is suitable for exporting statistics up to five variables; it is also suitable for disaggregating results at different levels of reporting.

#### Managing odds: Addressing educational disadvantage through school accountability and better HRM practices

##### Lilac Florentino, University of Queensland

One of the ongoing discussions in the economics of education relates to the features of education systems that are linked to better learning outcomes in students. While there is a consensus on the importance of accountability and human resource management (HRM) practices in this context, literature on education systems has yet to analyze these as complementary features that could positively influence student learning outcomes. This research establishes the cross-national differences in the patterns of school accountability and HRM practices and examines how they influence student learning outcomes, particularly in disadvantaged schools. This study employs the latent class analysis (LCA) on the 79 PISA-participating countries to establish a typology of school accountability and HRM practices across countries. This research contributes to the literature on education systems by creating a classification of accountability and HRM practices across countries. The analysis also provides guidance on the design and implementation of educational policies by offering a comprehensive understanding on how schools in different countries, and particularly in disadvantaged areas, can best adopt accountability mechanisms and HRM practices.

#### An enhanced tool for random sampling

##### Juvert Huaranga, Universidad Nacional Mayor de San Marcos

Random sampling is of great importance in many fields. Stata offers several commands to perform random sampling, although the dataset in memory is modified during the procedure. The command resam performs random sampling, without and with replacement, without modifying the current dataset, allows the user to select the variables to be sampled, and offers the possibility to extend the sample size beyond the number of observations.

#### Mastering Stata’s datetime concepts and functions

##### Hua Peng, StataCorp

When do leaplings (persons born on February 29) celebrate their birthdays in nonleap years? What is the difference, say, in milliseconds, between two timestamps if leap seconds are counted, based on Coordinated Universal Time (UTC) standards? What if you want to make sure that the dates are properly stored and ready for fitting a time-series model or performing survival analysis? Dates and times are all too familiar concepts we often take for granted. They lurk under data management and statistical analysis with various degrees of importance depending on the task at hand. In this talk, we will demonstrate how to handle these tasks using Stata's vast collection of date and time functions with highlights of the new functions in Stata 17.

#### Visualizing survey data-analysis results: Marrying the best from Stata and R

##### Nel Jason Haw, Johns Hopkins Bloomberg School of Public Health

Stata has a strong suite of survey data-analysis references and tools and remains the primary choice for researchers working with survey data. On the other hand, R is the primary choice for data visualization in many academic papers, given its flexibility, especially when using the ggplot2 package based on the design philosophy of The Grammar of Graphics. An unfulfilled need for many researchers is innovatively presenting survey data-analysis results without feeling limited by working within one statistical software only. This presentation discusses a workflow of using Stata for analysis and exporting the results through the postfile commands, then handing the data off to R to create a rich array of figures. As a proof of concept, the presentation will show results from an ongoing health economics research project from the Philippines of around 200,000 observations from national income and expenditure survey data to create publication-quality dumbbell plots, concentration curves, and Pen’s parades. Finally, the presentation will briefly describe how to share code and results in a public repository like Github.

#### Comparative benefits of analyzing spatial aggregate data using Stata’s Sp versus gsem and sem

##### Emil Coman, Health Disparities Institute, UConn School of Medicine

We demonstrate the powers of the underutilized Stata spatial analytical module Sp, with an eye on the broader and older path analytic modeling framework (gsem and sem, which stands for structural equation modeling [SEM]). Spatial aggregate data have become widely available, yet analysts often ignore their spatial structure (regions have neighbors, and neighboring regions are more similar than by chance). Research often reports artificial naïve/a-spatial associations that ignore this spatial nonindependence. We analyze public data from the CDC, on social vulnerability and life expectancy, at census tract level, using the state of CT in the USA as illustration. We compare (1) the spregress modeling options against SEM models that include the outcome’s spatial lag as copredictor; (2) a two-step mediation model with spregress against SEM with indirect effects; (3) the total effects of a spatial predictor on a spatial outcome estimated with spregress by adding up effects from neighbors to each region (and back), against nonrecursive SEM models that use spatial lag versions of each spatial variable as instrumental variables. We point to several extensions of spatial modeling into the SEM approach, like spatial factor analysis and spatial "causal" mediation models, and contrast Stata’s utilities against GeoDa and Mplus comparable models.

#### Estimating the accuracy and consistency of classifications based on item response theory measures

##### Matthew Rabbitt, U.S. Department of Agriculture

Latent variables are used in economics to represent measures that influence the behavior or capture the traits of economic agents. Inference with latent variables often requires classifying individuals based on estimates of these variables to make analyses more tractable and easier to convey one’s findings to a wider audience. While classifying individuals is often straightforward, requiring estimates of their latent variables and their corresponding standard errors, and cutpoints, relatively few instruments are without measurement error. In many cases, this measurement error is transferred onto the estimates of individuals’ latent variables, which may result in individuals being misclassified (Rudner 2001; Lee 2020; Lathrop 2015). Methodology has been developed to assess the extent of misclassification under item response theory (IRT). These methods rely on two indices, classification accuracy and classification consistency, to describe the quality of classification decisions. The former is a measure of the validity, while the latter is a measure of the reliability of classifications. In this presentation, I motivate the study of misclassification under IRT, introduce Stata users to a novel user-written estimation commend based on the Rudner method (Rudner 2001, 2005), irtacc, and provide an empirical example of an application of this command.

#### Fitting structural equation models with latent variable interactions using bayesmh

##### Rose Medieros, RA Medieros Consulting

Hypotheses involving interactions are common in many disciplines, as are structural equation models (SEM). In the frequentist framework, a number of methods for estimating SEM with latent variable interactions have been proposed, but no single method is widely implemented in software. Bayesian estimation offers an alternative for specifying interactions involving latent variables. The flexibility of bayesmh makes it possible to fit a Bayesian SEM in Stata. In this talk, I will provide a brief introduction to important concepts for fitting SEM with latent variable interactions in a Bayesian framework, as well as the mechanics of using bayesmh to fit those models. I will also highlight some of the advantages and challenges of fitting SEM with latent variable interactions in a Bayesian framework.

#### mlmeval: Complementary tools for an integrated approach to multilevel model selection

##### Anthony J. Gambino, University of Connecticut

Model evaluation is an unavoidable facet of multilevel modeling (MLM). Current guidance encourages researchers to focus on two overarching model-selection factors: model fit and model adequacy (McCoach et al. 2022). Researchers routinely use information criteria to select from a set of competing models and assess the relative fit of each candidate model to their data. However, researchers must also consider the ability of their models and their various constituent parts to explain variance in the outcomes of interest (i.e., model adequacy). Prior methods for assessing model adequacy in MLM are limited. Therefore, Rights and Sterba (2019) proposed a new framework for decomposing variance in MLM to estimate R-squared measures. Yet there is no Stata package that implements this framework. Thus, we propose a new Stata package that computes both (1) a variety of model fit criteria; and (2) the model adequacy measures described by Rights and Sterba to facilitate multilevel model selection for Stata users. The goal of this package is to provide researchers with an easy way to utilize a variety of complementary methods to evaluate their multilevel models.

#### Recovering income distribution in the presence of interval-censored data

##### Gustavo Javier Canavire-Bacarreza, The World Bank

We propose a method to analyze interval-censored data, using a multiple imputation based on a heteroskedastic interval regression approach. The proposed model aims to obtain a synthetic dataset that can be used for standard analysis, including standard linear regression, quantile regression, or poverty and inequality estimation. We present two applications to show the performance of our method. First, we run a Monte Carlo simulation to show the method's performance under the assumption of multiplicative heteroskedasticity, with and without conditional normality. Second, we use the proposed methodology to analyze labor income data in Grenada for 2013–2020, where the salary data are interval-censored according to the salary intervals prespecified in the survey questionnaire. The results obtained are consistent across both exercises.

## Registration

Price Student price
Both days:
4–5 August 2022
 $195 Register $75 Register
Day 1: Thursday
4 August 2022
 $125 Register $50 Register
Day 2: Friday
5 August 2022
 $125 Register $50 Register
Dinner (optional)
4 August 2022

## Venue

Marriott Marquis Washington, DC
901 Massachusetts Ave NW
Washington, DC 20001

## Scientific Committee

The scientific committee is responsible for the Stata Conference program. With submissions encouraged from both new and long-time Stata users from all backgrounds, the committee will review all abstracts in developing an exciting, diverse, and informative program. We look forward to seeing you in DC!

## Why should you attend

#### Network

Open to users of all disciplines and experience levels, Stata Conferences bring together a unique mix of experts and professionals. Develop a well-established network within the Stata Community.

#### Stay up to date

Hear from Stata experts in the top of their fields, as well as Stata's own researchers and developers. Gain valuable insights, discover new commands, learn best practices, and improve your knowledge of Stata.

#### Discover new features

Presentation topics have included new community-contributed commands, methods and resources for teaching with Stata, new approaches for using Stata together with other software, and much more.