Home  /  2024 Stata Biostatistics and Epidemiology Virtual Symposium

2024 Stata Biostatistics and
Epidemiology Virtual Symposium

22 February 2024

What is the Virtual Symposium?

A meeting of researchers in biostatistics and epidemiology from around the world discussing current theory and applied methods using Stata. The program consists of invited talks by top Stata users, and the virtual platform allows you to experience this one-day event from wherever you are.


Jennifer Thompson

London School of Hygiene and Tropical Medicine

Elisavet Syriopoulou

Karolinska Institutet

Mark Rutherford

University of Leicester

Babak Choodari-Oskooei

University College London

Paul Rathouz

University of Texas at Austin

Laura Gibbons

University of Washington

Joanna Dipnall

Monash University





Cluster randomized trial analysis made easy: The clan Stata command

Additional information:

Jennifer Thompson, London School of Hygiene and Tropical Medicine


Abstract: It is well established that the analysis of cluster randomized trials must account for the correlation between observations in the same cluster. These trials randomize whole groups of individuals like hospitals or villages to receive either a control condition or intervention condition, and individuals in the same cluster are likely to be more similar to one another than individuals in different clusters. While accounting for correlation between observations can be done using regression-based methods such as mixed-effects models, these are known to perform less well with fewer than around 30 clusters, which is common for cluster randomized trials. The cluster-level analysis, where the data are summarized for each cluster and then the cluster summaries analyzed as independent data points, is well established but much less commonly used in practice. In this talk, I will present some of the benefits and drawbacks of using the cluster-level analysis method and introduce the clan Stata command, which simplifies implementation of this approach.

An extension of mediation analysis to the relative survival framework

Additional information:

Elisavet Syriopoulou, Karolinska Institutet


Abstract: Mediation analysis can be applied to investigate the role of a third variable (a so-called mediator) on the pathway between an exposure and the outcome of interest through an effect decomposition. For example, it would allow exploring if and how much of the socioeconomic disparities observed in cancer survival can be explained by differences in stage at diagnosis. Here we describe an extension of mediation analysis within the relative survival framework. Relative survival is a commonly used measure in cancer epidemiology that estimates net survival, and its incorporation into mediation analysis allows focusing on disease-related survival differences as opposed to all-cause survival differences. The latter is the result of both cancer-related and other factors that are more challenging to identify. Using mediation analysis in relative survival and counterfactual survival functions, we can partition the difference in marginal relative survival between exposure groups into the difference due to a mediator (such as stage at diagnosis, which has an indirect effect) and the remaining difference (due to the exposure, a direct effect). The proportion mediated can also be obtained together with contrasts of all-cause survival differences as well as the number of “avoidable deaths” under interventions aimed at modifiable risk factors. We illustrate how all of these measures of interest can be estimated and introduce certain assumptions required to do so. To that end, we fit a flexible parametric survival model for the survival outcome using the stpm3 command and a separate multinomial logistic model for the mediator. We then combine these models using a regression standardization approach, implemented within the standsurv command, with uncertainty estimated using a parametric bootstrap procedure. We will illustrate the approach in practice using an example on survival differences in rectal cancer survival by income groups in Sweden. Finally, the methods that we developed could be applied in other disease areas as well, where exploring the underlying mechanisms for disease-specific survival differences is of interest. Mediation analysis in the relative survival framework provides thus a valuable tool that has the potential to improve our understanding of factors driving health disparities and informing policies aimed at modifiable mediators (risk factors).

Reference-adjusted cancer survival measures. What are they, when are they useful, and how are they implemented in Stata?

Additional information:

Mark Rutherford, University of Leicester


Abstract: Background: Ensuring fair comparisons of cancer survival statistics across population groups requires careful consideration of differential competing mortality due to other causes and adjusting for imbalances in terms of other prognostic covariates (for example, age). This has typically been achieved using comparisons of age-standardized net survival, with the age standardization addressing covariate imbalance and the net estimates removing differences in competing mortality from other causes. However, these estimates lack ease of interpretability. In this talk, I'll motivate an alternative approach that uses a common (reference) rate of other-cause mortality across groups to give reference-adjusted cancer survival measures.

Methods: We'll discuss both the methodology and Stata implementation to enable both model-based and nonparametric estimation of reference-adjusted cancer survival metrics. These measures allow fair comparison of all-cause survival across groups with differential other-cause mortality (for exmaple, across countries, socioeconomic groups, or calendar periods).

Results: These measures retain comparability but stay closer to the real-world risks of dying, allowing direct comparison across population groups with different covariate profiles and competing mortality patterns. In our illustrative example, we show regional variations in survival following a diagnosis of rectal cancer persist even after accounting for the regional variation in demographic profile of cancer patients and regional variation in other cause mortality.

Conclusions: The methodological approach of using standardized and reference-adjusted metrics offers an appealing approach for future cancer survival comparison studies. The calculation of these metrics is readily available in Stata, building on the strong suite of official and community-contributed survival analysis commands.

Optimal multiarm multistage platform randomized trials using Stata: Rationale, design, and implementation

Additional information:

Babak Choodari-Oskooei, University College London


Abstract: Multiarm multistage (MAMS) platform randomized clinical trials are an efficient, adaptive approach for testing many treatments within one master protocol. They include flexible features such as early stopping of accrual to treatments for lack of benefit, as well as the opportunity to add new treatment arms over the course of the trial. The MAMS framework has been used in several confirmatory randomized clinical trials for the treatment of COVID-19, cancer, infectious diseases, neurodegenerative diseases, and surgery.

This talk introduces the MAMS design, elucidates its underlying rationale, and highlights its advantages. Utilizing the nstage suite of commands in Stata and two ongoing real MAMS trials in prostate cancer (STAMPEDE) and surgery (ROSSINI-2), it illustrates how to calculate the required sample size for such a complex design. Further, it addresses the challenge of identifying efficient MAMS trial designs with specific overall pairwise or familywise operating characteristics and provides guidelines on how to implement these designs.

Semiparametric generalized linear models with discrete (or continuous?) data: Bayesian implementation in Stata

Additional information:

Paul Rathouz, University of Texas at Austin
Coauthors: Entejar Alam and Peter Mueller, University of Texas


Abstract: Rathouz and Gao (2009) introduced a semiparametric extension of the classic generalized linear model (GLM) family, in which the mean model is user specified via a linear predictor and link function, as in any quasilikelihood formulation. The response distribution (the "random component" in a GLM) is specified as a nonparametric reference distribution, yielding a semiparametric model. The dimension of the nonparametric component in the semiparametric generalized linear model (SPGLM) is of the same order as the cardinality of the support space of the response. These models have been applied and are often well suited for settings ranging from ordinal data with a finite support to continuous data. After introducing the SPGLM, in this talk I will discuss and illustrate Bayesian estimation and inference using the bayesmh suite of tools in Stata.

Generalizing research based in a Seattle integrated health delivery system to all older adults in the region

Additional information:

Laura Gibbons, University of Washington


Abstract: The Adult Changes in Thought (ACT) study is a cohort of Kaiser Permanente Washington members ages 65 plus. We want to know how well decades of ACT data represent all older adults currently alive in the Seattle Metropolitan Region and how well findings transport from ACT to the regional population. This talk will focus on the transport process and the code I have posted to do this. I will demonstrate the use of participation weights that incorporate the probability someone in the Seattle Metropolitan Region would be included in ACT. As examples, we will look at analyses of the prevalence of common eye diseases and their effect on dementia incidence. Steps include 1) aligning variables available in both ACT and the Behavioral Risk Factor Surveillance System, 2) multiple imputation to create participation weights for everyone, 3) the iterative process of constructing the participation weights and assessing covariate balance, 4) analyses using these weights, and 5) bootstrapping confidence intervals to account for the error in estimating the weights.

Managing complex pooled international cohort data in Stata: Health-related quality of life (HRQoL) outcomes following injury in childhood and adolescence using EuroQol (EQ-5D) responses with pooled longitudinal data

Additional information:

Joanna Dipnall, Monash University
Coauthors: Frederick P. Rivara, Ronan A. Lyons, Shanthi Ameratunga, Mariana Brussoni, Fiona E. Lecky, Clare Bradley, Ben Beck, Jane Lyons, Amy Schneeberg, James E. Harrison, Belinda J. Gabbe


Abstract: Injury is a leading contributor to the global disease burden in children, affecting their health-related quality of life (HRQoL), yet valid estimates of burden are absent. This study pooled longitudinal data from 5 international cohort studies of pediatric injury survivors (517 years) (n = 2334). HRQoL postinjury was measured using the 3-level EQ-5D utility score (EQ-5D) and 5 binary health states (mobility, self-care, activity, pain, anxiety and depression [anxiety]). The complexities of pooling complex international longitudinal data included varying inclusion and exclusion criteria, time points (baseline, 1, 4, 6, 12, and 24 months) and measurements of diagnosis and EQ-5D. The use of various Stata tools enabled the smooth implementation and replication of this analysis. HRQ outcomes over time for children and adolescents postinjury were found to differ across key demographic and injury-related attributes. Results from this research were published and highlighted the importance of tailored interventions to respond to the varying postinjury recovery trajectories in this population.

Registration closed