Home  /  Stata Conferences  /  2022 UK

Proceedings

10:15–10:45 resultssets in resultsframes in Stata 16-plus Abstract: A resultsset is a Stata dataset created as output by a Stata command.
(Read more)
It may be listed and saved in a disk file or written over an existing dataset in memory and (in Stata Versions 16 or higher) written to a data frame (or resultsframe) in the memory, without damaging any existing data frames. Commands creating resultssets include parmest, parmby, xcontract, xcollapse, descsave, xsvmat, and xdir. Commands useful for processing resultsframes include xframeappend, fraddinby, and invdesc. I survey the ways in which resultsset processing has been changed by resultsframes.

(Read less)

Additional information:
UK22_Newson.pdf

Roger Newson
King's College London
10:45–11:05 A suite of Stata programs for analyzing simulation studies Abstract: Simulation studies are used in a variety of disciplines to evaluate the properties of statistical methods.
(Read more)
Simulation studies involve creating data by random sampling, typically from known probability distributions, with the aim of assessing the robustness and accuracy of new statistical techniques by comparing them with some known truth. I introduce the siman suite for the analysis of simulation results. siman is a set of Stata programs that offers data manipulation, analysis, and graphics to process, explore, and visualize the results of simulation studies.

siman expects a sensibly structured dataset of simulation study estimates, with input variables being in ‘long’ or ‘wide’ format, string, or numeric. The estimates data can be reshaped by siman reshape to enable data exploration.

The key commands include siman analyse to estimate and tabulate performance; graphs to explore the estimates data (siman scatter, siman swarm, siman zipplot, siman blandaltman, siman comparemethodsscatter); and a variety of graphs to visualize the performance measures (siman nestloop, siman lollyplot, siman trellis) in the form of scatterplots, swarm plots, zip plots, Bland–Altman plots, nested-loop plots, lollyplots, and trellis graphs (Morris, White, and Crowther 2019).

References:

Morris, T. P., I. R. White, and M. J. Crowther. 2019. Using simulation studies to evaluate statistical methods. Statistics in Medicine 38: 2074–2102.

(Read less)

Additional information:
UK22_Marley-Zagar.pptx

Ella Marley-Zagar
University College London
11:05–11:35 Cook’s distance measures for panel-data models Abstract: Influential observations in regression analysis are data points whose deletion has a large impact on the estimated coefficients.
(Read more)
The usual diagnostics for assessing the influence of each data point are designed for least-squares regression and independent observations and are not appropriate when estimating panel-data models.

The purpose of this presentation is to describe a new command, cooksd2, that extends the traditional Cook’s (1977) distance measure to determine the influence of each data point when applying the fixed-, random-, and between-effects regression estimators. The approach is based on the framework developed by Christensen, Pearson, and Johnson (1992) and also reports the influence of an entire subject or group of data points following the methods described by Banerjee and Frees (1997).

References:

Cook, R. D. (1977). Detection of influential observation in linear regression. Technometrics 19: 15–18.

Banerjee, M., and E. W. Frees. (1997). Influence diagnostics for linear longitudinal models. Journal of the American Statistical Association 92: 999 1005.

Christensen, R., L. M. Pearson, and W. Johnson. 1992. Case-deletion diagnostics for mixed models. Technometrics 34: 38–45.

(Read less)

Additional information:
UK22_Vincent.pdf

David Vincent
David Vincent Economics
11:35–12:35 Bayesian multilevel modeling Abstract: In multilevel or hierarchical data, which include longitudinal, cross-sectional, and repeated-measures data, observations belong to different groups.
(Read more)
Groups may represent different levels of hierarchy, such as hospitals, doctors nested within hospitals, and patients nested within doctors nested within hospitals. Multilevel models incorporate group-specific effects in the regression model and assume that they vary randomly across groups according to some a priori distribution, commonly a normal distribution. This assumption makes multilevel models natural candidates for Bayesian analysis. Bayesian multilevel models additionally assume that other model parameters such as regression coefficients and variance components—variances of group-specific effects—are also random.

In this presentation, I will discuss some of the advantages of Bayesian multilevel modeling over the classical frequentist estimation. I will cover some basic random-intercept and random-coefficients modeling using the bayes: mixed command. I will then demonstrate more advanced model fitting by using the new-in-Stata-17 multilevel syntax of the bayesmh command, including multivariate and nonlinear multilevel models.

(Read less)

Additional information:
UK22_Marchenko.pdf

Yulia Marchenko
StataCorp
1:40–2:00 Bias-corrected estimation of linear dynamic panel-data models Abstract: In the presence of unobserved group-specific heterogeneity, the conventional fixed-effects and random-effects estimators for linear panel-data models are biased when the model contains a lagged dependent variable and the number of time periods is small.
(Read more)
I present a computationally simple bias-corrected estimator with attractive finite-sample properties, which is implemented in the new xtdpdbc Stata package. The estimator relies neither on instrumental variables nor on specific assumptions about the initial observations. Because it is a method of moments estimator, standard errors are readily available from asymptotic theory. Higher-order lags of the dependent variable can be accommodated as well. A useful test for the correct model specification is the Arellano–Bond test for residual 3 autocorrelation. The random-effects versus fixed-effects assumption can be tested using a Hansen overidentification test or a generalized Hausman test. The user can also specify a hybrid model, in which only a subset of the exogenous regressors satisfies a random-effects assumption.

Contributor:
Jörg Breitung
University of Cologne
(Read less)

Additional information:
UK22_Kripfganz.pdf

Sebastian Kripfganz
University of Cologne
2:00–2:30 Impact of proximity to gas production activity on birth outcomes across the US Abstract: Despite mounting evidence on the health effects of natural gas development (NGD), including hydraulic fracturing (“fracking”), existing research has been constrained to high-producing states, limiting generalizability.
(Read more)
We examined the impacts of prenatal exposure to NGD production activity in all gas-producing US states on birth outcomes overall and by race/ethnicity. Mata routines were developed to link 185,376 NGD production facilities in 28 U.S. states and their distance-weighted monthly output with county population centroids via geocoding. These data were then merged with 2005–2018 county-level microdata natality files on 33,849,409 singleton births from 1,984 counties in 28 states, using nine-month county-level averages of NGD production by both conventional and unconventional production methods, based on month/year of birth.

Linear regression models were fit to examine the impact of prenatal exposure to NGD production activity on birthweight and gestational age, while logistic regression models were used for the dichotomous outcomes of low birthweight (LBW), preterm birth, and small for gestational age (SGA). Overall, prenatal exposure to NGD production activity increased adverse birth outcomes. We found that a 10% increase in NGD production in a county decreased mean birthweight by 1.48 grams. A significant interaction by race/ethnicity revealed that a 10% increase in NGD production decreased birthweight for infants born to Black women by 10.19 grams and Asian women by 2.76 grams, with no significant reductions in birthweight for infants born to women from other racial/ethnic groups. Although effect sizes were small, results were highly consistent. NGD production decreases infant birthweight, particularly for those born to minoritized mothers.

Contributors:
Hailee Schuele
Philip J. Landrigan
Summer Sherburne Hawkins
Boston College
(Read less)

Additional information:
UK22_Baum.pdf

Christopher F. Baum
Boston College
2:30–3:00 Estimating compulsory schooling impacts on labor market outcomes in Mexico Abstract: This study estimates the impacts on labor market outcomes of the 1993 compulsory schooling reform in Mexico.
(Read more)
A well-known problem in this analysis is the endogeneity between schooling and labor market outcomes due to unobservable characteristics that could jointly determine them. There is also heterogeneity in the empirical evidence of the effectiveness of such schooling policies among developing and developed countries, perhaps because of the different contexts and identification strategies used. Some studies use instrumental-variables (IV) and difference-in-differences (D-i-D) methods to tackle endogeneity issues. Most analyses use a regression discontinuity design (RDD) approach with different order polynomials of the year of birth (for example, cubic or quartic order), whereas few studies use birth month for more accurate and robust estimates because it allows more schooling variation within a year.

The impact of the Mexican policy is analyzed in this study through a fuzzy RDD approach with the use of Stata for the period 2009 to 2017. It addresses endogeneity by exploiting the age cohort discontinuities in birth month, for more robust estimation, as an exogenous source of education variation. Fuzzy RDD then compares schooling and labor market outcomes among the birth cohorts exposed with those not exposed to the reform. The fuzziness accounts for the imperfect compliance by using the random assignment of the exposure to the policy.

Stata allows plotting discontinuity graphs between cohorts as well as the McCrary test to validate the use of this methodology. It also facilitates parametric and nonparametric analyses. The empirical evidence suggests that the 1993 compulsory schooling law, although raising average school attendance, was an insufficient policy to impact labor market outcomes in Mexico. The analysis contributes to the limited literature on the returns to compulsory schooling that uses a rigorous RDD methodology in developed and developing countries.

(Read less)

Additional information:
UK22_Leon_Bravo1.pdf

Erendira Leon Bravo
University of Westminster
3:30–4:00 Bias-adjusted three-step latent class analysis using R and the gsem command in Stata Abstract: In this presentation, we will describe a means to perform bias-adjusted latent class analysis using three-step methodology.
(Read more)
This method is often performed using MPLUS, LATENT GOLD, or specific functions in Stata. Here we will describe a novel means to perform this analysis using the poLCA package in R to perform the first two steps and the gsem command in Stata to perform the third step. This methodology is applied to a case study involving performing causal analysis by integrating inverse probability of treatment weights into the methodology. We will also demonstrate how to obtain estimates of the average causal effect of exposure on a latent class using the margins command with robust standard errors. Our aim is to broaden awareness of three-step latent class methods and causal analysis and offer means to perform this methodology for users of R, for which there currently is little software available.

Contributor:
Bianca de Stavola
UCL
(Read less)

Additional information:
UK22_Tompsett.pdf

Daniel Tompsett
UCL
4:00–4:30 Distributed lag nonlinear models (DLNMs) in Stata Abstract: The distributed lag nonlinear models (DLNMs) represent a modeling framework to flexibly describe associations showing potentially nonlinear and delayed effects in time-series data.
(Read more)
This methodology rests on the definition of a crossbasis, a bidimensional functional space combining two sets of basis functions that specify the relationships in the dimensions of predictor and lags, respectively. DLNMs have been widely used in environmental epidemiology to investigate the short-term associations between environmental exposures, such as weather variables or air pollution, and health outcomes, such as mortality counts or disease-specific hospital admissions. We implemented the DLNMs framework in Stata through the crossbasis command to generate the basis variables that can be fit in a broad range of regression models. In addition, the postestimation commands crossbgraph and crossbslices allow interpreting the results, emphasizing graphical representation, after the regression model fit. We present an overview of the capabilities of these new community-contributed commands and describe the practical steps to fit and interpret DLNMs with an example of real data to represent the relationship between temperature and mortality in London during the period 2002–2006.

Contributors:
Ben Armstrong
Antonio Gasparrini
Spanish Research Council (CSIC) and LSHTM
(Read less)

Additional information:
UK22_Tobias.pptx

Aurelio Tobias
Spanish Research Council (CSIC) and LSHTM
4:30–5:15 Advanced data visualizations with Stata: Part III Abstract: The presentation will showcase recent developments in complex data visualizations with Stata.
(Read more)
These include various types of polar plots, for example, spider plots, sunburst charts, circular bar graphs, and various visualizations with spatial data, including bivariate maps, gridded waffle charts, and map clippings. Updates for several Stata packages, including joyplot, bimap, streamplot, and clipgeo, will be presented, and suggestions for improving Stata’s graph capabilities will be discussed.

(Read less)

Additional information:
UK22_Naqvi.pdf

Asjad Naqvi
Austrian Institute for Economic Research (WIFO), International Institute for Applied Systems Analysis (IIASA), and Vienna University of Economics and Business (WI)
9:10–9:40 Grinding axes: Axis scales, labels, and ticks Abstract: This is a roundup of not quite utterly obvious tips and tricks for graph axes, using both official and community-contributed commands.
(Read more)
Ever needed a logarithmic scale but found default labels undesirable?

  • a slightly non-standard scale such as logit, reciprocal, or root?
  • a tick to be suppressed?
  • labels between ticks, not at them?
  • automagic choice of “nice” labels under your control?

Community-contributed commands mentioned will include mylabels, myticks, nicelabels, niceloglabels, qplot, and transplot.

(Read less)

Additional information:
UK22_Cox.pptx

Nick Cox
Durham University
9:40–10:00 Exchangeably weighted bootstrap schemes Abstract: The exchangeably weighted bootstrap is one of the many variants of bootstrap resampling schemes.
(Read more)
Rather than directly drawing observations with replacement from the data, weighted bootstrap schemes generate vectors of replication weights to form bootstrap replications. Various ways to generate the replication weights can be adopted, and some choices bring practical computational advantages. This presentation demonstrates how easily such schemes can be implemented and where they are particularly useful. It also introduces the exbsample command, which facilitates their implementation.

(Read less)

Additional information:
UK22_Van_Kerm.pdf

Philippe Van Kerm
LISER and University of Luxembourg
10:00–10:30 Improving fitting and predictions for flexible parametric survival models Abstract: Flexible parametric survival models have been available in Stata since 2000 with Patrick Royston’s stpm command.
(Read more)
I developed stpm2 in 2008, which added various extensions. However, the command is old and does not take advantage of some of the features Stata has added over the years. I will introduce stpm3, which has been completely rewritten and adds a number of useful features, including
  • Full support for factor variables (including for time-dependent effects).
  • Use of extended functions within a varlist. Incorporate various functions (splines, fractional polynomial functions, etc.) directly within a varlist. These also work when including interactions and time-dependent effects.
  • Easier and more intuitive predictions. These fully synchronize with the extended functions making predictions for complex models with multiple interactions/nonlinear effects incredibly simple. Make predictions for specific covariate patterns and perform various types of contrasts.
  • Directly save predictions to one or more frames. This separates the data used to analyze the data for predictions.
  • Obtain various marginal estimates using standsurv. This synchronizes with stpm3 factor variables and extended functions, making marginal estimates much easier and less prone to user mistakes for complex models.
  • Model on the log(hazard) scale. Do all the above for standard survival models, competing-risks models, multistate models, and relative survival models all within the same framework.

(Read less)

Additional information:
UK22_Lambert.html

Paul Lambert
University of Leicester and Karolinska Institutet
11:00–11:30 sttex: A new dynamic document command for Stata and LaTeX Abstract: In this presentation, I will introduce a new command for processing a dynamic LaTeX document in Stata, for example, a document containing both LaTeX paragraphs and Stata code.
(Read more)
A key feature of the new command is that it tracks changes in the Stata code and executes the code only when needed, allowing for an efficient workflow. The command is useful for creating automated statistical reports, writing articles with data analysis, preparing slides for a methods course or a conference talk, or even writing a complete textbook with examples of applications.

(Read less)

Additional information:
UK22_Jann.pdf

Ben Jann
University of Bern
11:30–12:30 Custom estimation tables Abstract: This presentation illustrates how to construct custom tables from one or more estimation commands.
(Read more)
I demonstrate how to add custom labels for significant coefficients and make targeted style edits to cells in the table using the following commands:
  • collect get
  • collect dir
  • collect dims
  • collect levelsof
  • collect label list
  • collect label values
  • collect layout
  • collect query header
  • collect style header
  • collect style showbase
  • collect style row
  • collect style cell
  • collect query column
  • collect style column
  • collect style stars
  • collect query column
  • collect preview
  • etable
I begin with a description of what constitutes a collection and how items (numeric and string results) in a collection are tagged (identified) and conclude with a simple workflow to enable users to build their own custom tables from estimation commands. This presentation motivates the construction of estimation tables and concludes with the convenience command etable.

(Read less)

Additional information:
UK22_Pitblado.html

Jeff Pitblado
StataCorp
1:30–2:00 The impact of a government pay reform in Mexico on the public sector wage gap Abstract: The 2018 federal pay reform on the remuneration of public servants in Mexico is used to exploit its impacts on the public–private sector wage gap across the unconditional wage distribution in a developing country context.
(Read more)
This policy uses both payment cuts and freezes for public sector workers.

Using cross-sectional data from 2017 to 2019, both the mean and unconditional quantile (UQ) regression models within a difference-in-differences (DID) framework are fit. Stata allows the use of UQ regressions based on the recentred influence function (RIF) to center the IF around the statistic of interest (for example, the population mean ‘µ’, 10 E[Y]) and not zero (for example, reweighting the observations) for generating the RIF quantiles. The RIF average effects are interpreted at different quantiles of the unconditional wage distribution (for example, the 5th or 95th percentiles or other intermediate quantiles).

Then the DID approach implemented through Stata provides the effects of the reform before and after the policy intervention. It also deals with the endogeneity of employment selection by accounting for the differences in the unobservable effects of the public–private employment sector selection pretreatment. Posttreatment, such unobservables are differenced out to mitigate the concerns about potential selection bias.

Robustness checks are also executed with Stata, such as cohort fixed effects with pseudopanel dataset, a two-step model within a Heckman framework, the Hansen J-statistic to test orthogonality, an IV-based model, an individual-level fixed-effects (FE) model with a panel dataset, and a placebo in-time test.

Although there is some evidence that public sector employees anticipated the introduction of the policy, it reduced the public sector pay gap strongly among the lower-paid workers of the unconditional pay distribution. The UQ effects of this policy change on the public–private sectoral wage gap contribute to the limited literature for both developed and developing countries.

Contributor:
Barry Reilly
University of Sussez
(Read less)

Additional information:
UK22_Leon_Bravo2.pdf

Erendira Leon Bravo
University of Westminster
2:00–2:30 Illuminating the factor and dependence structure in large panel models Abstract: In panel models, a precise understanding about the number of common factors and dependence across the cross-sectional dimension is key for any applied work.
(Read more)
This presentation will give an overview about how to estimate the number of common factors and how to test for cross-sectional dependence. It does so by presenting two community-contributed commands: xtnumfac and xtcd2. xtnumfac implements 10 different methods to estimate the number of factors, among them the popular methods by Bai and Ng (2002) and Ahn and Horenstein (2013). The degree of cross-section dependence is investigated using xtcd2. xtcd2 implements three different tests for cross-section dependence based on Pesaran (2015), Juodis and Reese (2021), and Pesaran and Xie (2021). The presentation includes a review of the theory, a discussion of the commands, and empirical examples.

(Read less)

Additional information:
UK22_Ditzen.pdf

Jan Ditzen
Free University of Bozen-Bolzano
2:30–3:00 mixrandregret: A command for fitting mixed random regret minimization models using Stata Abstract: This presentation describes the mixrandregret command, which extends the randregret command (Gutiérrez-Vargas, Meulders and Vandebroek. 2021. The Stata Journal 21: 626–658), incorporating random coefficients for random regret minimization (RRM) models.
(Read more)
The command can fit a mixed version of the classic RRM model introduced in Chorus (European Journal of Transport and Infrastructure Research. 2010. 10: 181–196). It allows the user to specify a combination of fixed and random coefficients. In addition, the users can specify normal and log-normal distributions for the random coefficients using the commands’ options. Finally, the models are fit using simulated maximum-likelihood procedures using numerical integration to simulate the models’ choice probabilities.

Contributors:
Ziyue Zhu
Martina Vandebroek
KU Leuven
(Read less)

Additional information:
UK22_Gutierrez-Vargas.pdf

Álvaro A. Gutiérrez-Vargas
KU Leuven
3:30–4:30 Open panel discussion with Stata developers
Contribute to the Stata community by sharing your feedback with StataCorp's developers. From feature improvements to bug fixes and new ways to analyze data, we want to hear how Stata can be made better for our users.

Scientific committee

Tim Morris
MRC, Clinical Trials Unit, UCL
Rachael Hughes
University of Bristol

Logistics organizer

The logistics organizer for the 2022 UK Stata Conference is Timberlake Consultants, the Stata distributor to the United Kingdom and Ireland, France, Spain, Portugal, the Middle East and North Africa, Brazil, and Poland.

View the proceedings of previous Stata Conferences and Users Group meetings.