
The 16th Northern European Stata Conference will take place on 29 August at the Karolinska Institutet.
This conference will provide Stata users with the opportunity to exchange ideas, experiences, and information on new applications of Stata. Representatives from StataCorp will attend and host an open panel discussion, so you can share your questions and feedback directly with Stata developers. Anyone interested in using Stata is welcome. No level of expertise is assumed for presenters or attendees.
All times are in CEST (UTC +2)
8:30–9:00 | Registration |
9:00–9:30 | wqsreg: A Stata command for weighted quantile sum regression
Abstract:
Weighted quantile sum (WQS) regression is a flexible statistical method for quantifying the association
between a set of possibly correlated predictors and a health outcome. This approach is gaining
substantial popularity in several fields such as environmental epidemiology, where it allows
estimating the overall effects of complex environmental mixtures as well as the specific
contributions of each mixture component. A Stata command for fitting this increasingly popular
procedure, however, has not been developed yet. To address this gap, we have developed a
new command, wqsreg, that enables users to fit WQS regression models for continuous
outcomes while allowing for the several flexible components of this framework, including adjust
for potential confounders; estimating both positive and negative overall mixture effects;
providing robust weight estimates through bootstrap; specify the method used to rank variables
included in the mixture (for example, quartiles); provide iteration limits to be performed before
optimization; and fix the seed and customize save options. wqsreg returns the estimates from WQS
regression, plots the estimated weights, and creates a dataset containing the WQS index for
each subject. In this talk, we will introduce the key features of WQS regression, describe wqsreg,
and demonstrate its use through examples. Given the increasing importance of appropriately
exploring complex multidimensional exposures such as environmental mixtures, this command
provides Stata users with one of the first commands to apply a modern computational approach
specifically developed for these settings.
Contributors:
Stefano Renzetti
Università degli Studi di Parma
Andrea Bellavia
Harvard T.H. Chan School of Public Health
Marta Ponzano
Università di Genova
|
9:30–10:00 | Fitting joinpoint models for descriptive analysis of cancer trends in Stata
Abstract:
Investigation of temporal trends of cancer incidence and mortality rates is often performed
visually with interest in changes in the gradient of increases or decreases in the rates. Joinpoint
models are used to help quantify the trends, using linear splines where both the number and
location of the knots (joinpoints) are selected as part of the modeling process. I will describe a
Stata implementation of joinpoint models and introduce the joinpoint command and associated
postestimation commands. The approach can be computer intensive because all possible
combinations of the number and the location of the knots are fit when selecting the models. I will
describe how use of Mata to fit the models leads to dramatic speed improvements. The joinpoint
command has various options, for example, choosing different model-selection criteria and
choosing the maximum number of knots and the minimum number of data points between
knots. Output options include estimation of the annual percent change (APC), with two different
methods to calculate confidence intervals. There is a postestimation predict command and a
command to provide visual summaries of the fitted model.
Paul C Lambert
Cancer Registry of Norway and Karolinska Institutet
|
10:00–10:30 | Stata 20 will have correct inference on random effects
Abstract:
Mixed models, and random effects in particular, are used routinely to model data with
dependent observations and effect heterogeneity. However, while random effects are
convenient for specifying a model, they often complicate inference. As a result, popular
software for statistical analysis often does not provide confidence intervals for random effect
parameters by default, or worse, provides provably unreliable ones. This talk discusses the
challenges and possible solutions.
Matteo Bottai
Karolinska Institutet
|
10:30–11:00 | Break |
11:00–12:00 | Modeling interval-censored event-time data with Stata
Abstract:
Do you have event-time data that you would like to model but are unsure exactly when the
events occurred? In survival analysis, interval-censored event-time data arise when the event of
interest is not observed precisely but is known to have occurred within a specific time interval.
Stata 17 introduced the stintcox command to fit genuine semiparametric Cox models for such
data, and Stata 18 expanded its capabilities by adding support for time-varying covariates
(TVCs). Building on this, Stata 19 introduces the new stmgintcox command, enabling the
modeling of interval-censored multiple-event data while accounting for potential correlations
between event times across different event types. In this presentation, I will describe the
fundamental types of interval-censored data and demonstrate how to fit the semiparametric
Cox proportional hazards model using the stintcox command. I will provide examples using
single-record and multiple-record-per-subject datasets and show how to incorporate TVCs.
Additionally, I will discuss how to interpret and plot results, and how to assess the
proportional hazards assumption. Finally, I will show you how to fit a marginal Cox
proportional hazards model to interval-censored multiple-event data and perform a more
powerful test for common covariate effects across all events.
Xiao Yang
StataCorp
|
12:00–1:00 | Lunch |
1:00–1:30 | Prediction intervals in meta-analysis: A clearer view of heterogeneity and expected future findings using Stata
Abstract:
Meta-analyses in epidemiology often rely on 95% confidence intervals (CIs) to summarize the
precision of pooled estimates. However, CIs are frequently misinterpreted and offer limited
insight into how study results vary (heterogeneity) or what future studies might show. Prediction
intervals (PIs), by contrast, directly reflect such between-study variability and estimate the range
within which the true effect of a future study is expected to fall—providing a more interpretable
and policy-relevant view of uncertainty. This talk presents the rationale for using PIs in meta-analyses
of odds ratios (ORs), drawing on the methods described in Borenstein’s widely used
text on the subject. PIs will be contrasted with traditional heterogeneity measures like I2, which
is often misused or overinterpreted as a precise index of inconsistency. In addition, PIs allow
framing heterogeneity in terms of expected future effects and provides a more intuitive and
decision-relevant perspective. Using Stata, I will demonstrate how to compute and visualize PIs,
including enhanced graphical methods based on probability density functions (PDFs). Such plots
go beyond Stata’s whiskerlike default PI displays in forest plots by better illustrating both the
expected range and the relative likelihood of future effect sizes—conveying direction, dispersion,
and uncertainty in a single visual. Attendees will gain a practical and conceptual understanding
of how PIs can complement or even surpass CIs and I2 as tools for interpreting and applying
meta-analytic evidence in epidemiology.
David J. Miller
U.S. Environmental Protection Agency (retired)
|
1:30–2:00 | Supplementing risk ratios in sibling analysis: Estimating clinically useful measures from family-based analysis
Abstract:
Family-based designs, like sibling comparisons, are powerful tools for addressing confounding,
but they often rely solely on relative measures such as odds ratios or hazard ratios—limiting their
interpretability for clinical and policy decision-making. In this talk, I introduce the marginalized
between-within framework, a method that enhances family-based analyses by enabling the
estimation of absolute risks and other clinically meaningful metrics. I will begin with an overview of
sibling comparison methods and the rationale behind decomposing effects into within- and
between-family components. Then, using Swedish registry data, I’ll demonstrate how this
framework can be applied to assess the impact of maternal smoking on infant mortality. The
model allows us to estimate absolute risk differences, average treatment effects, attributable
fractions, and numbers needed to harm—metrics that are often more useful than relative
estimates. Compared with traditional conditional logistic or stratified Cox regression models, the
marginalized between-within approach offers similar relative estimates but adds the crucial
ability to anchor results to a global baseline, making absolute measures possible. These
measures provide clearer insights for public health and policy interventions.
Hugo Sjöqvist
Karolinska Institutet
|
2:00–2:30 | Imputing right-skewed bounded biomarkers in partially measured cohorts
Abstract:
In large medical and epidemiological studies, important biomarkers are often available only
for a limited fraction of participants because of the high laboratory costs or feasibility constraints. This
results in a high proportion of missing values. Imputation strategies can be employed to prevent
the loss of information. However, imputing biomarker values is challenging because of the right-skewed
and naturally bounded values of biomarker distributions. In this talk, I compare two
imputation strategies that can handle such challenges: a likelihood-based approach and logistic
quantile imputation implemented in Stata. I evaluate the performance of both methods
through simulation, assessing bias and inferential errors. The approaches are illustrated with a
practical example of recently discovered blood biomarkers in Alzheimer’s research. The results
provide some insight on recovering biomarker distributions when outcome data are fully
observed but biomarkers are only partially measured.
Contributor:
Robert Thiesmeier
Karolinska Institutet
Nicola Orsini
Karolinska Institutet
|
2:30–3:00 | Break |
3:00–4:00 | Linking frames in Stata
Abstract:
This presentation gives an overview of data frames in Stata. I demonstrate the basics of working
with multiple datasets in Stata. I cover most of the frames suite of commands, touching on
frame creation and management, linking frames, copying variables from linked frames, alias
variables, and working with a set of frames.
Jeff Pitblado
StataCorp
|
4:00–5:00 | Open panel discussion with Stata developers
Contribute to the Stata community by sharing your feedback with StataCorp's developers. From feature improvements to bug fixes and new ways to analyze data, we want to hear how Stata can be made better for our users.
|
The conference is free, but registration is required. All participants are responsible for their own travel and accommodation expenses.
To register for the conference, please email your name, affiliation, and contact details.
Registration deadline is 28 August 2025.
Visit the official conference page for more information.
The 2025 Northern European Stata Conference is jointly organized by Metrika Consulting AB, the official distributor of Stata for Russia and the Nordic and Baltic countries, and the Division of Biostatistics at Karolinska Institutet.
View the proceedings of previous Stata Conferences and international meetings.