Home  /  Stata Conferences  /  2023 UK

Proceedings

10:00–10:20 Customized Markdown and .docx tables using listtab and docxtab Abstract: Statisticians make their living producing tables (and plots).
(Read more)
I present an update of a general family of methods for making customized tables called the DCRIL path (decode, characterize, reshape, insert, list), with customized table cells (using the sdecode package), customized column attributes (using the chardef package), customized column labels (using the xrewide package), and/or customized inserted gap-row labels (using the insingap package), and listing these tables to automatically generated documents. This demonstration uses the listtab package to list Markdown tables for browser-ready HTML documents, which Stata users like to generate, and the docxtab package to list .docx tables for printer-ready .docx documents, which our superiors like us to generate.

(Read less)

Additional information:
UK23_Newson.zip

Roger B. Newson
King's College London
10:20–10:40 Multiply imputing informatively censored time-to-event data Abstract: Time-to-event data, such as overall survival in a cancer clinical trial, are commonly right-censored, and this censoring is commonly assumed to be noninformative.
(Read more)
While noninformative censoring is plausible when censoring is due to end of study, it is less plausible when censoring is due to loss to follow-up. Sensitivity analyses for departures from the noninformative censoring assumption can be performed using multiple imputation under the Cox model. These have been implemented in R but are not commonly used. We propose a new implementation in Stata.

Our existing stsurvimpute command (on SSC) imputes right-censored data under noninformative censoring, using a flexible parametric survival model fit by stpm2. We extend this to allow a sensitivity parameter gamma, representing the log of the hazard ratio in censored individuals versus comparable uncensored individuals (the informative censoring hazard ratio, ICHR). The sensitivity parameter can vary between individuals, and imputed data can be recensored at the end-of-study time. Because the mi suite does not allow imputed variables to be stset, we create an imputed data set in ice format and analyze it using mim.

In practice, sensitivity analysis computes the treatment effect for a range of scientifically plausible values of gamma. We illustrate the approach using a cancer clinical trial.

References:
Jackson D., I. R. White, S. Seaman, H. Evans, K. Baisley, J. Carpenter. 2014. Relaxing the independent censoring assumption in the Cox proportional hazards model using multiple imputation. Statistics in medicine. 33: 4681–4694.

https://CRAN.R-project.org/package=InformativeCensoring

Contributor:
Patrick Royston
MRC Clinical Trials Unit at UCL
(Read less)

Additional information:
UK23_White.pptx

Ian R. White
MRC Clinical Trials Unit at UCL
10:40–11:00 Influence analysis with panel data using Stata Abstract: The presence of units that possess extreme values in the dependent variable and independent variables (for example, vertical outliers, good and bad leverage points) has the potential to severely bias least-squares (LS) estimates—for example, regression coefficients and standard errors.
(Read more)
Diagnostic plots (such as leverage-versus-squared residual plots) and measures of overall influence (for example, Cook's [1979] distance) are usually used to detect such anomalies, but there are two different problems arising from their use. First, available commands for diagnostic plots are built for cross-sectional data, and some data manipulation is necessary for panel data. Second, Cook-like distances may fail to flag multiple anomalous cases in the data because they do not account for pairwise influence of observations (Atkinson 1993; Chatterjee and Hadi 1988, Rousseeuw 1991; Rousseeuw and Van Zomeren 1990, Lawrance 1995). I overcome these limits as follows. First, I formalize statistical measures to quantify the degree of leverage and outlyingness of units in a panel-data framework to produce diagnostic plots suitable for panel data. Second, I build on Lawrance's [1995] pairwise approach by proposing measures for joint and conditional influence suitable for panel-data models with fixed effects.

I develop a method to visually detect anomalous units in a panel dataset and identify their types; investigate the effect of these units on LS estimates, and on other units’ influence. I propose two community-contributed commands in Stata to implement this method. xtlvr2plot produces a leverage-versus-residual plot suitable for panel data, and a summary table with the list of detected anomalous units and their type. xtinfluence calculates the joint and conditional influence and effects of pairs of units, and generates network-style plots (an option between scatterplot or heat plot is allowed by the command).

JEL codes: C13, C15, C23.

(Read less)

Additional information:
UK23_Polselli.pdf

Annalivia Polselli
Institute for Analytics and Data Science and University of Essex
11:00–11:30 A suite of programs for the design, development, and validation of clinical prediction models Abstract: An ever-increasing number of research questions focuses on the development and validation of clinical prediction models to inform individual diagnosis and prognosis in healthcare.
(Read more)
These models predict outcome values (for example, pain intensity) or outcome risks (for example, five-year mortality risk) in individuals from a target population (for example, pregnant women; cancer patients). Development and validation of such models is a complex process, with a myriad of statistical methods, validation measures, and reporting options. It is therefore not surprising that there is considerable evidence of poor methodology in such studies.

In this presentation, I will introduce a suite of ancillary software packages with the prefix “pm”. The pm-suite of packages aims to facilitate the implementation of methodology for building new models, validating existing models and transparent reporting. All packages are in line with the recommendations of the TRIPOD guidelines, which provide a benchmark for the reporting of prediction models.

I will showcase a selection of packages to aid in each stage of the life cycle of a prediction model, from the initial design (for example, sample-size calculation using pmsampsize and pmvalsampsize), to development and internal validation (for example, calculating model performance using pmstats), external validation (for example, flexible calibration plots of performance in new patients using pmcalplot), and model updating (for example, comparing updating methods using pmupdate).

Through an illustrative example, I will demonstrate how these packages allow researchers to perform common prediction modeling tasks quickly and easily while standardizing methodology.

(Read less)

Additional information:
UK23_Ensor.pptx

Dr. Joie Ensor
University of Birmingham
11:30–12:30 Bayesian model averaging Abstract: Model uncertainty accompanies many data analyses.
(Read more)
Stata's new bma suite, which performs Bayesian model averaging (BMA), helps address this uncertainty in the context of linear regression. Which predictors are important given the observed data? Which models are more plausible? How do predictors relate to each other across different models? BMA can answer these and more questions. BMA uses the Bayes theorem to aggregate the results across multiple candidate models to account for model uncertainty during inference and prediction in a principled and universal way. In my presentation, I will describe the basics of BMA and demonstrate it with the bma suite. I will also show how BMA can become a useful tool for your regression analysis, Bayesian or not!

(Read less)

Additional information:
UK23_Marchenko.pdf

Yulia Marchenko
StataCorp LLC
1:30–1:50 Prioritizing clinically important outcomes using the win ratio Abstract: The win ratio is a statistical method used for analyzing composite outcomes in clinical trials.
(Read more)
Composite outcomes are composed of two or more distinct “component” events (for example, heart attacks, death) and are often analyzed using time-to-first event methods ignoring the relative importance of the component events. When using the win ratio, component events are instead placed into a hierarchy from most to least important; more important components can then be prioritized over less important outcomes (for example, death, followed by myocardial infarction). The method works by first placing patients into pairs. Within each pair, one evaluates the components in order of priority starting with the most important until one of the pair is determined to have a better outcome than the other.

A major advantage of the approach is its flexibility: one can include in the hierarchy outcomes of different types (for example, time-to-event, continuous, binary, ordinal, and repeat events). This can have major benefits, for example by allowing assessment of quality of life or symptom scores to be included as part of the outcome. This is particularly helpful in disease areas where recruiting enough patients for a conventional outcomes trial is unfeasible.

The win-ratio approach is increasingly popular, but a barrier to more widespread adoption is a lack of good statistical software. The calculation of sample sizes is also complex and usually requires simulation. We present winratiotest, the first package to implement win-ratio analyses in Stata. The command is flexible and user-friendly. Included in the package is the first software (we know of) that can calculate the sample size for win-ratio-based trials without requiring simulation.

Contributors:
Tim Collier
Joan Pedro Ferreira
London School of Hygiene and Tropical Medicine
(Read less)

Additional information:
UK23_Gregson.pptx

John Gregson
London School of Hygiene and Tropical Medicine
1:50–2:10 Object-oriented programming in Mata Abstract: Object-oriented programming (OOP) is a programming paradigm that is ubiquitous in today's landscape of programming languages.
(Read more)
OOP code proceeds by first defining separate entities—classes—and their relationships, and then lets them communicate with each another. Mata, Stata's matrix language, does have such OOP capabilities. Comparison with some other programming languages that are object-oriented, like Java or C++, Mata offers a lighter implementation, but does so by striking a nice balance between feature availability and language complexity.

This presentation explores OOP features in Mata by describing the code behind dtms, a community-contributed package for discrete-time multistate model estimation. Estimation in dtms proceeds in several steps, where each step can nest multiple results of the next level, thus building up a treelike structure of results. The presentation explains how this treelike structure is implemented in Mata using OOP, and what the benefits of using OOP for this task are. These include easier code maintenance via a more transparent code structure, shorter coding time, and an easier implementation of efficient calculations.

The presentation will at first provide simple examples of useful classes; for example, a class that represents a Stata matrix in Mata, or a class that can grab, hold, and restore Stata e()-results. More complex relationships among classes will then be explored in the context of the treelike results structure of dtms. While topics covered will include such technically sounding concepts as class composition, self-threading code, inheritance, and polymorphism, an effort will be made to link these concepts to tasks that are relevant to Stata users that have already gained or are interested in gaining an initial proficiency of Mata.

(Read less)

Additional information:
UK23_Schneider.pdf
UK23_Schneider.pptx

Daniel C. Schneider
Max Planck Institute for Demographic Research
2:10–2:40 A review of machine learning commands in Stata: Performance and usability evaluation Abstract: This presentation provides a comprehensive survey reviewing machine learning (ML) commands in Stata.
(Read more)
I systematically categorize and summarize the available ML commands in Stata and evaluate their performance and usability for different tasks such as classification, regression, clustering, and dimension reduction. I also provide examples of how to use these commands with real-world datasets and compare their performance. This review aims to help researchers and practitioners choose appropriate ML methods and related Stata tools for their specific research questions and datasets, and to improve the efficiency and reproducibility of ML analyses using Stata. I conclude by discussing some limitations and future directions for ML research in Stata.

(Read less)

Additional information:
UK23_Cerulli.pdf

Giovanni Cerulli
CNR-IRCRES
2:40–3:10 On the shoulders of giants: Writing wrapper commands in Stata Abstract: For repeated tasks, it is convenient to use commands with simple syntax that carry out more complicated tasks under the hood.
(Read more)
These can be data management and visualization tasks or statistical analyses. Many of these tasks are variations or special cases of more versatile approaches. Instead of reinventing the wheel, wrapper commands build on the existing capabilities by “wrapping” around other commands. For example, certain types of graphs might require substantial effort when building them from scratch using Stata's graph twoway commands, but this process can be automated with a dedicated command. Similarly, many estimators for specific models are special cases of more general estimation techniques, such as maximum likelihood or generalized method of moments estimators. A wrapper command can be used to translate relatively simple syntax into the more complex syntax of Stata's ml or gmm commands, or even directly into the underlying optimize() or moptimize() Mata functions. Many official Stata commands can be regarded as wrapper commands, and often there is a hierarchical wrapper structure with multiple layers. For example, most commands for mixed-effects estimation of particular models are wrappers for the general meglm command, which itself just wraps around the undocumented _me_estimate command, which then calls gsem, which in turn initiates the estimation with the ml package. The main purpose of the higher-layer wrappers is typically syntax parsing. With every layer the initially simple syntax is translated into the more general syntax of the lower-layer command, but the user only needs to be concerned with the basic syntax of the lop-layer command. Similarly, community-contributed commands often wrap around official or other community-contributed commands. They may even wrap around packages written for other programming environments, such as Python.

In this presentation, I discuss different types of wrapper commands and focus on practical aspects of their implementation. I illustrate these ideas with two of my own commands. The new spxtivdfreg wrapper adds a spatial dimension to the xtivdfreg command (Kripfganz and Sarafidis 2021) for defactored instrumental-variables estimation of large panel-data models with common factors. The xtdpdgmmfe wrapper provides a simplified syntax for the GMM estimation of linear dynamic fixed-effects panel-data models with the xtdpdgmm command.

(Read less)

Additional information:
UK23_Kripfganz.pdf

Sebastian Kripfganz
Univeristy of Exeter
3:40–4:10 Gigs package -new egen extensions for international newborn and child growth standards Abstract: Children’s growth status is an important measure commonly used as a proxy indicator of advancements in a country’s health, human capital, and economic development.
(Read more)
Understanding how and why child growth patterns have changed is necessary for characterizing global health inequalities. Sustainable development goal 3.2 aims to reduce preventable newborn deaths by at least 12 deaths per 1,000 live births and child deaths to 25 per 1,000 live births (WHO/UNICEF, 2019). However, large gaps remain in achieving these goals: currently 54 and 64 (of 194) countries will miss the targets for child (<5 years) and neonatal (<28 days) mortality, respectively (UN IGME, 2022). Because infant mortality is associated strongly with nonoptimal growth, accurate growth assessment using prescriptive growth standards is essential to reduce these mortality gaps.

A range of standards can be used to analyze infant growth: In newborns, size-for-gestational age analysis of different anthropometric measurements is possible using the Newborn Size standards from the International Fetal and Newborn Growth Consortium for the 21st Century (INTERGROWTH-21st) project (Villar et al., 2014). In infants, growth analysis depends on whether the child is born preterm or term: for term infants, the WHO Child Growth Standards are appropriate (WHO MGRS Group, 2006), whereas there are INTERGROWTH-21st standards for post-natal growth in preterm infants (Villar et al., 2018). Unfortunately, many researchers apply these standards incorrectly, which can lead to inappropriate interpretations of growth trajectories (Perumal et al., 2022).

As part of the Guidance for International Growth Standards (GIGS) project, we are making a range of these tools available in Stata to provide explicit, evidence-based functions through which these standards can be implemented in research and clinical care. We therefore introduce several egen extensions for converting between anthropometric measurements and centiles/z-scores in WHO and INTERGROWTH-21st standards. We also describe several egen functions that classify newborn size and infant growth according to international growth standards.

References:
Perumal, N., E. O. Ohuma, A. M. Prentice, P. S. Shah, A. Al Mahmud, S. E. Moore, D. E. Roth. 2022. Implications for quantifying early life growth trajectories of term-born infants using INTERGROWTH-21st newborn size standards at birth in conjunction with World Health Organization child growth standards in the postnatal period. Paediatric and Perinatal Epidemiology 6: 839–850.

United Nations Inter-agency Group for Child Mortality Estimation (UN IGME). 2023. Levels & Trends in Child Mortality: Report 2022, Estimates developed by the United Nations Inter-agency Group for Child Mortality Estimation, United Nations Children’s Fund, New York.

Villar, J., L. C. Ismail, C. G. Victora, E. O. Ohuma, E. Bertino, D. G. Altman, A. Lambert, A. T. Papageorghiou et al. 2014. International standards for newborn weight, length, and head circumference by gestational age and sex: The Newborn Cross-Sectional Study of the INTERGROWTH-21st Project. The Lancet 384(9946): 857–868.

Villar, J., F. Giuliani, Z. A. Bhutta, E. Bertino, E. O. Ohuma, L. C. Ismail, F. C. Barros, D. G. Altman, et al. 2015. Postnatal growth standards for preterm infants: The Preterm Postnatal Follow-up Study of the INTERGROWTH-21st Project. The Lancet Global Health 3(11): e681–e691.

WHO Multicentre Growth Reference Study Group. 2006. WHO Child Growth Standards based on length/height, weight and age. Acta Paediatrica Suppl. 450: 76–85.

WHO/UNICEF. 2019. WHO/UNICEF discussion paper: The extension of the 2025 maternal, infant and young child nutrition targets to 2030. https://data.unicef.org/resources/who-unicef-discussion-paper-nutrition-targets/ (accessed May 15th, 2023).

Contributors:
Linda Vesel
Harvard T. H. Chan School of Public Health and Brigham and Women's Hospital
Eric Ohuma
London School of Hygiene and Tropical Medicine
(Read less)

Additional information:
UK23_Parker.zip

Simon Parker
London School of Hygiene and Tropical Medicine
4:10–4:30 Plot suite: Fast graphing commands for very large datasets Abstract: This presentation showcases the functionality of the new “plot suite” of graphing commands.
(Read more)
The suite excels in visualizing very large datasets, enabling users to produce a variety of highly-customizable plots in a fraction of time required by Stata's native graphing commands.

(Read less)

Additional information:
UK23_Kabatek.pdf

Jan Kabatek
Melbourne Institute of Applied Economic and Social Research
4:30–5:30 pystacked and ddml: Machine learning for prediction and causal inference in Stata Abstract: pystacked implements stacked generalization (Wolpert 1992) for regression and binary classification via Python’s scikit-learn.
(Read more)
Stacking is an ensemble method that combines multiple supervised machine learners—the “base” or “level-0” learners—into a single learner. The currently-supported base learners include regularized regression (lasso, ridge, elastic net), random forest, gradient boosted trees, support vector machines, and feed-forward neural nets (multilayer perceptron). pystacked can also be used to fit a single base learner and thus provides an easy-to-use API for scikit-learn’s machine learning algorithms.

ddml implements algorithms for causal inference aided by supervised machine learning as proposed in “Double/debiased machine learning for treatment and structural parameters” (Econometrics Journal 2018). Five different models are supported, allowing for allowing for binary or continuous treatment variables and endogeneity in the presence of high-dimensional controls and/or instrumental variables. ddml is compatible with many existing supervised machine learning programs in Stata, and in particular has integrated support for pystacked, making it straightforward to use machine learner ensemble methods in causal inference applications.

Contributors:
Achim Ahrens
ETH Zürich
Christian B. Hansen
Thomas Wiemann
University of Chicago
(Read less)

Additional information:
UK23_Schaffer_ddml.pdf
UK23_Schaffer_pystacked.pdf

Mark E. Schaffer
Heriot-Watt University
9:00–9:20 Fitting the Skellam distribution in Stata Abstract: The Skellam distribution is a discrete probability distribution related to the difference between two independent Poisson-distributed random variables.
(Read more)
It has been used in a variety of contexts, including sports or supply and demand imbalances in shared transportation. To the best of our knowledge, Stata does not support the Skellam distribution or the Skellam regression. In this presentation, I plan to show how to fit the parameters of a Skellam distribution and Skellam regression using Mata’s optimize function. The optimization problem is then packaged into a basic Stata command that I plan to describe.

(Read less)

Additional information:
UK23_Verardi.pdf

Vincenzo Verardi
Université libre de Bruxelles
9:20–9:40 A short report on making Stata secure and adding metadata in a new data platform Abstract: The presentation has two parts. A version of the first part was presented at the 2022 Northern European Stata Conference.
(Read more)
Part 1. Securing Stata in a secure environment. Data access and logging.

At CRN, we develop a secure environment for using Stata. A short description of this work is given describing the data access and logging of data extraction (JDBC + Java plugins) and Stata commands.

Part 2. Metadata using characteristics.

In the new solution, metadata is automatically attached to Stata .dta characteristics when users fetch data from the data warehouse. The implementation is described, along with some small utility programs to use metadata, and examples of use are presented.

(Read less)

Additional information:
UK23_Aagnes.pptx

Bjarte Aagnes
Cancer Registry of Norway
9:40–10:00 Facilities for optimizing and designing multiarm multistage (MAMS) randomized controlled trials with binary outcomes Abstract: In this presentation, we introduce two Stata commands, nstagebin and nstagebinopt, which can be used to facilitate the design of multiarm multistage (MAMS) trials with binary outcomes.
(Read more)
MAMS designs are a class of efficient and adaptive randomized clinical trials that have successfully been used in many disease areas, including cancer, TB, maternal health, COVID-19, and surgery. The nstagebinopt command finds a class of efficient “admissible” designs based on an optimality criterion using a systematic search procedure. The nstagebin command calculates the stagewise sample sizes, trial timelines, and the overall operating characteristics of MAMS design with binary outcomes. Both programs allow the use of Dunnett's correction to account for multiple testing. We also use the ROSSINI 2 MAMS design, an ongoing MAMS trial in surgical wound infection, to illustrate the capabilities of both programs. The new Stata commands facilitate the design of MAMS trials with binary outcomes where more than one research question can be addressed under one protocol.

Reference:
Choodari-Oskooei B., D. J. Bratton, and M. Parmar. 2023. Facilities for optimizing and designing multiarm multistage (MAMS) randomised controlled trials with binary outcomes. Stata Journal. Under review.

Contributors:
Daniel J. Bratton
GlaxoSmithKline
Mahesh KB Parmar
University College London
(Read less)

Additional information:
UK23_Choodari-Oskooei.pptx

Babak Choodari-Oskooei
University College London
10:00–10:20 How to check a simulation study Abstract: Simulation studies are a powerful tool in biostatistics, but they can be hard to conduct successfully.
(Read more)
Sometimes, unexpected results are obtained. We offer advice on how to check a simulation study when this occurs and how to design and conduct the study to give results that are easier to check. Simulation studies should be designed to include some settings where answers are already known. Code should be written in stages, and data-generating mechanisms should be checked before simulated data are analyzed. Results should be explored carefully, with scatterplots of standard error estimates against point estimates a surprisingly powerful tool. When estimation fails or there are outlying estimates, these should be identified, understood, and dealt with by changing data-generating mechanisms or coding realistic hybrid analysis procedures. Finally, we give a series of ideas that have been useful to us in the past for checking unexpected results. Following our advice may help to prevent errors and to improve the quality of published simulation studies. We illustrate the ideas with a simple but realistic simulation study in Stata.

Contributors:
Ian R. White
Matteo Quartagno
Tim P. Morris
MRC Clinical Trials Unit at UCL
(Read less)

Additional information:
UK23_Pham.pptx

Tra My Pham
MRC Clinical Trials Unit at UCL
10:20–10:40 Drivers of COVID-19 deaths in the United States: A two-stage modeling approach Abstract: We offer a two-stage (time-series and cross-section) econometric modeling approach to examine the drivers behind the spread of COVID-19 deaths across counties in the United States.
(Read more)
Our empirical strategy exploits the availability of two years (January 2020 through January 2022) of daily data on the number of confirmed deaths and cases of COVID-19 in the 3,000 U.S. counties of the 48 contiguous states and the District of Columbia. In the first stage of the analysis, we use daily time-series data on COVID-19 cases and deaths to fit mixed models of deaths against lagged confirmed cases for each county. As the resulting coefficients are county specific, they relax the homogeneity assumption that is implicit when the analysis is performed using geographically aggregated cross-section units. In the second stage of the analysis, we assume that these county estimates are a function of economic and sociodemographic factors that are taken as fixed over the course of the pandemic. Here we employ the novel one-covariate-at-a-time variable-selection algorithm proposed by Chudik et al. (Econometrica, 2018) to guide the choice of regressors. The second stage utilizes the SUR technique in an unusual setting, where the regression equations correspond to time periods in which cross-sectional estimates at the county level are available.

Contributors:
Andrés Garcia-Suaza
University del Rosario
Miguel Henry
Jesús Otero
University del Rosario
(Read less)

Additional information:
UK23_Baum.pdf

Kit Baum
Boston College
11:10–11:30 Use of Stata in modeling the determinants of work engagement Abstract: The research goal was to identify the determinants of the phenomenon of work engagement.
(Read more)
Two primary datasets were used provided by Eurofound in the European Working Conditions Survey. Data were gathered before and during the COVID-19 pandemic, which allowed me to include the pandemic context in the analysis. Additionally, some macroeconomic and other social variables were included, such as GDP per capita, labor force participation rate, unemployment rate, the level of social trust, Doing Business Index, and European Quality of Government Index. Stata, with its potential for data cleaning and checking, allowed me to merge all variables from complex datasets into one set with 115,608 observations and over 100 variables from 34 European countries. When I prepared the data, some repetitive tricks of commands were applied. Stata programmability helped in preparing the model using the logistic regression method. A dichotomous outcome (dependent) variable was modeled—engaged or not engaged into work. The predictor variables of interest were those related to work, such as working conditions, occupational characteristics, and the level of human capital. The logistic command in Stata produced results in terms of odds ratios, which were interpreted to calculate the effect of chosen predictors on the response variable and consequently to take or reject the constructed research hypothesis. The innovation of presented analysis lies in including macroeconomic or macrosocial variables and consideration of the international and intersectoral analysis. The presented logit model provided by Stata possibilities fills the research gap in the area of the work engagement phenomenon.

(Read less)

Additional information:
UK23_Hojda.pptx

Paulina Hojda
University of Łódź
11:30–12:30 Heterogeneous difference-in-difference estimation Abstract: Treatment effects might differ over time and for groups that are treated at different points in time, treatment cohorts.
(Read more)
In Stata 18, we introduced two commands that estimate treatment effects that vary over time and cohort. For repeated cross-sectional data, we have hdidregress. For panel data, we have xthdidregress. Both commands let you graph the evolution of treatment over time. They also allow you to aggregate treatment within cohort and time and visualize these effects. I will show you how both commands work and briefly discuss the theory underlying them.

(Read less)

Additional information:
UK23_Pinzón.pdf

Enrique Pinzón
StataCorp LLC
1:30–1:50 A robust test for linear and log-linear models against Box-Cox alternatives Abstract: The purpose of this presentation is to describe a new command, xtloglin, that tests the suitability of the linear and log-linear regression models against Box-Cox alternatives.
(Read more)
The command uses a GMM-based Lagrange multiplier test, which is robust to nonnormality and heteroskedasticity of the errors and extends the analysis by Savin and Würtz (2005) to panel data regressions after xtreg.

The Box-Cox transformation, first introduced by Box and Cox (1964), is a popular approach for testing the linear and log-linear functional forms, because both are special cases of the transformation. The usual approach is to estimate the Box-Cox model by maximum likelihood, assuming normally distributed homoskedastic errors, and test the restrictions on the transformation parameter, which lead to linear and log-linear specifications using a Wald or likelihood ratio test.

Despite the popularity of this approach, the estimator of the transformation parameter is not just restricted to the search for nonlinearity but also to one that leads to more normal errors, with constant variance. This can result in an estimate that favors log-linearity over linearity even though the true model is linear with non-normal or heteroskedastic errors. These issues are resolved by xtloglin because the GMM estimator is consistent under less restrictive distributional assumptions.

References:

Box, G. E., and Cox, D. R. (1964). An analysis of transformations. Journal of the Royal Statistical Society: Series B (Methodological), 26(2), 211–243.

Savin, N. E., and Würtz, A. H. (2005). Testing the semiparametric Box–Cox model with the bootstrap. Identification and Inference for Econometric Models: Essays in Honor of Thomas Rothenberg, 322–354.

(Read less)

Additional information:
UK23_Vincent.pdf

David Vincent
David Vincent Econometrics
1:50–2:10 Network regressions in Stata Abstract: Network analysis has become critical to the study of social sciences.
(Read more)
While several Stata programs are available for analyzing network structures, programs that execute regression analysis with a network structure are currently lacking. We fill this gap by introducing the nwxtregress command. Building on spatial econometric methods (LeSage and Pace 2009), nwxtregress uses MCMC estimation to produce estimates of endogenous peer effects, as well as own-node (direct) and cross-node (indirect) partial effects, where nodes correspond to cross-sectional units of observation, such as firms, and edges correspond to the relations between nodes. Unlike existing spatial regression commands (for example, spxtregress), nwxtregress is designed to handle unbalanced panels of economic and social networks. Networks can be directed or undirected with weighted or unweighted edges, and they can be imported in a list format that does not require a shapefile or a Stata spatial weight matrix set by spmatrix. A special focus of the presentation will be put on the construction of the spatial weight matrix and integration with Python to improve speed.

Contributors:
William Grieser
Morad Zekhnini
Free University of Bozen-Bolzano
(Read less)

Additional information:
UK23_Ditzen.pdf

Jan Ditzen
Free University of Bozen-Bolzano
2:10–2:30 The joy of sets: Graphical alternatives to Euler and Venn diagrams Abstract: Given several binary (indicator) variables and intersecting sets, a Euler or Venn diagram may spring to mind, but even with only a few sets the collective pattern becomes hard to draw and harder to think about easily.
(Read more)
In genomics and elsewhere, so-called upsetplots (specialized bar charts for the purpose) have become popular recently as alternatives. This presentation introduces an implementation, upsetplot, a complementary implementation, vennbar, and associated minor extras and utilities. Applications include examination of the structure of missing data and of the cooccurrence of medical symptoms or any other individual binary states. These new commands are compared with previous graphical commands, both official and community contributed and both frequently used and seemingly little known.

Secondary themes include data structures needed to produce and store results; what works better with graph bar and what works better with twoway bar; and the serendipity of encounters at Stata users' meetings.

Contributor:
Tim P. Morris
MRC Clinical Trials Unit, UCL
(Read less)

Additional information:
UK23_Cox.pptx

Nicholas J. Cox
Durham University
2:30–3:00 geoplot: A new command to draw maps. Abstract: geoplot is a new command for drawing maps from shape files and other datasets.
(Read more)
Multiple layers of elements such as regions, borders, lakes, roads, labels, and symbols can be freely combined and the look of elements (for example, color) can be varied depending on the values of variables. Compared with previous solutions in Stata, geoplot provides more user convenience, more functionality, and more flexibility. In this presentation, I will introduce the basic components of the command and illustrate its use with examples.

(Read less)

Additional information:
UK23_Jann.pdf

Ben Jann
University of Bern
3:30–4:30 Open panel discussion with Stata developers
Contribute to the Stata community by sharing your feedback with StataCorp's developers. From feature improvements to bug fixes and new ways to analyze data, we want to hear how Stata can be made better for our users.

Scientific committee

Stephen Jenkins
London School of Economics
Roger Newson
Kings College London
Tim Collier
London School of Hygiene and Tropical Medicine

Logistics organizer

The logistics organizer for the 2023 UK Stata Conference is Timberlake Consultants, the Stata distributor to the United Kingdom and Ireland, France, Spain, Portugal, the Middle East and North Africa, Brazil, and Poland.

View the proceedings of previous Stata Conferences and Users Group meetings.