Home  /  Stata Conferences  /  2025 Italy

The 19th Italian Stata Conference will take place on 25 September 2025 in Milan. There will also be an optional workshop on 26 September.

Meet researchers from different disciplinary areas, discover new applications highlighting Stata’s potential capabilities for applied research, exchange new community-contributed commands developed for Stata, and interact directly with statisticians from StataCorp.


Program

All times are CEST (UTC +2)

Thursday, 25 September

8:30–9:00 Registration
9:00–10:25
Session I: Exploiting the potential of Stata 19, I

Linking frames in Stata Abstract:
(Read more)
This presentation gives an overview of data frames in Stata. I demonstrate the basics of working with multiple datasets in Stata. I cover most of the frames suite of commands, touching on frame creation and management, linking frames, copying variables from linked frames, alias variables, and working with a set of frames.

(Read less)

Jeff Pitblado
StataCorp

The new cate command: An overview Abstract:
(Read more)
This presentation offers a concise overview of the cate command, a new tool introduced in Stata 19 for estimating conditional average treatment effects (CATEs). CATEs quantify how the impact of a treatment varies across individuals or subgroups defined by observed characteristics, thus enabling a more nuanced understanding of treatment-effect heterogeneity and supporting the design of targeted policy interventions.

(Read less)

Giovanni Cerulli
IRCrES-CNR
10:25–10:45 Break
10:45–12:15
Session II: Community-contributed commands, I

xtbreak: Testing and estimating structural breaks in time-series and panel data in Stata Abstract:
(Read more)
Identifying structural change is a crucial step in analysis of time series and panel data. The longer the time span, the higher the likelihood that the model parameters have changed as a result of major disruptive events, such as the 2007–2008 financial crisis and the 2020 COVID-19 outbreak. Detecting the existence of breaks and dating them is therefore necessary not only for estimation purposes but also for understanding drivers of change and their effect on relationships. This talk introduces a new community-contributed command called xtbreak, which provides researchers with a complete toolbox for analyzing multiple structural breaks in time-series and panel data. xtbreak can detect the existence of breaks, determine their number and location, and provide break date confidence intervals. A special emphasis of the talk will be put on Python integration to gain speed advantages.

(Read less)

Jan Ditzen
Libera Università di Bolzano

Variance components in panel data Abstract:
(Read more)
A preliminary and crucial step in any empirical research on panel data, whether longitudinal, time-series cross-section, or multilevel, is to study the nature and relevance of the components that influence the variability of the variables, particularly the dependent variable. Each panel dataset can be considered as a set of grouped data, whether these are temporal observations nested within individuals or individuals nested within groups and supergroups. The fundamental steps for guiding the modeling strategies to be adopted are as follows: breaking down the total variability into variances between and within clusters, also in terms of percentage shares; assessing whether there are relevant common factors within clusters and, in the case of temporal observations, whether these are stationary or not; and comparing the relevance and significance of group and individual effects depending on whether they are considered fixed or random.

(Read less)

Maria Elena Bontempi
Università di Bologna

fffuroot: Implementing in Stata unit-root and stationarity tests with smooth breaks approximated by flexible Fourier forms Abstract:
(Read more)
This work describes the Stata implementation of unit-root and stationarity tests with flexible Fourier forms as in Enders and Lee (2012a), (2012b) and Becker, Enders, and Lee (2006).

(Read less)

Giovanni Bruno
Università Bocconi
12:15–1:00
Session III: Stata tips and tricks

xtplot2 Abstract:
(Read more)
The xtplot2 command investigates the structure of panel datasets with respect to unbalancedness and values using heat plots. It allows the researcher a quick and efficient way to gain insights into the structure.

(Read less)

Jan Ditzen
Libera Università di Bolzano

Automating episode splitting: Introducing the splitting command for Stata Abstract:
(Read more)
Event history analysis (also known as survival analysis) is a well-established analytical tool in the social sciences and research more broadly, and it is particularly useful when researchers aim to estimate the effect of time-varying variables. Survival analysis is well supported in Stata via numerous built-in commands. In particular, stsplit facilitates breaking the time axis into episodes to include time-varying covariates in the analysis. While stsplit is straightforward to use when the time axis must be split at the point a change occurs in a dichotomous variable, the procedure becomes less intuitive when dealing with polytomous variables.

(Read less)

Davide Bussi
Università degli Studi di Milano-Bicocca

xtgetpca Abstract:
(Read more)
Extracting principal components in panel data is common. However no Stata solution exists. xtgetpca fills this gap. It allows for different types of standardization, removal of fixed effects, and unbalanced panels.

(Read less)

Jan Ditzen
Libera Università di Bolzano
1:00–2:00 Lunch
2:00–3:20
Session IV: Exploiting the potential of Stata 19, II

Meta-analysis in Stata Abstract:
(Read more)
Many studies attempt to answer similar research questions. For instance, you may have results from studies asking, “What is the association between unemployment and mental health?” Or you may have results from studies asking, “How does motherhood affect women’s wages?” The results from different studies may be inconclusive or conflicting. Meta-analysis is a statistical technique for combining the results from several similar studies. It allows us to explore the variation across studies and, when appropriate, provide a single estimate for the effect size of interest. In this presentation, I show how to use the meta suite of commands to perform meta-analysis in Stata.

(Read less)

Gabriela Ortiz
StataCorp

Consensus clustering in Stata Abstract:
(Read more)
This work considers consensus clustering in Stata, combining bootstrapped k-means with hierarchical clustering based on a coassociation matrix. The method addresses the possible inherent instability of partitioning-based clustering by aggregating results from multiple bootstrap samples, improving robustness and reproducibility. In this respect, at each iteration, k-means clustering is applied, and the results are collected in a large-scale cluster assignment matrix. A consensus matrix is then created to measure the cooccurrence of observations within the same cluster across all iterations. This matrix is transformed into a dissimilarity structure and in this way subjected to hierarchical clustering in order to obtain a final, stable partition.

This framework shows how consensus clustering can be performed robustly and efficiently in Stata. It uses a combination of Stata routines, bootstrap sampling, and optimized Mata routines to compute the co-association matrix, ensuring computational efficiency. The approach is broadly applicable to clustering tasks in the social sciences, economics, epidemiology, and other fields where cluster stability is critical.

(Read less)

Carlo Drago
Università degli Studi Niccolò Cusano
3:20–3:35 Break
3:35–4:35
Session V: Community-contributed commands, II

outdetect: Outlier detection for inequality and poverty analysis Abstract:
(Read more)
Extreme values are common in survey data and represent a recurring threat to the reliability of both poverty and inequality estimates. The adoption of a consistent criterion for outlier detection is useful in many practical applications, particularly when international and intertemporal comparisons are involved. In this talk, I discuss a simple univariate detection procedure to flag outliers. I present outdetect, a command that implements the procedure and provides useful diagnostic tools. The output of outdetect compares statistics obtained before and after the exclusion of outliers, with a focus on inequality and poverty measures. Finally, I carry out an extensive sensitivity exercise where the same outlier detection method is applied consistently to per capita expenditure across more than 30 household budget surveys. The results are clear and provide a sense of the influence of extreme values on poverty and inequality estimates.

(Read less)

Giulia Mancini
Università degli Studi di Sassari

rdlasso: A Stata command for high-dimensional regression discontinuity designs Abstract:
(Read more)
The rdlasso command implements regression discontinuity designs (RDD) with high-dimensional covariates in Stata. The procedure is based on the methodology developed by Kreiss and Rothe (2023), and extends it to both sharp and fuzzy designs. Covariate selection is performed through a lasso-based local estimation, ensuring valid inference under approximate sparsity.

The command is built using Stata’s Python integration via the SFI module and automates all steps of the estimation process—from covariate selection to bandwidth choice and bias-corrected treatment-effect estimation. The syntax allows for flexible user control while remaining fully embedded in the Stata environment.

rdlasso enables Stata users to apply machine learning techniques for causal inference without requiring programming in external platforms such as R or Python. The command generates output variables that can be used for further postestimation analysis within the same session. An option automatically distinguishes between sharp and fuzzy designs, making the tool both user-friendly and methodologically complete. The implementation is illustrated through a step-by-step example and an empirical application. The command contributes to the growing set of tools for modern causal analysis in Stata, particularly in high-dimensional settings.

(Read less)

Marianna Nitt
Sapienza – Università di Roma
4:35–5:40
Session VI: Exploiting the potential of Stata 19, III

Automated data extraction from unstructured text using LLMs: A scalable workflow for Stata users Abstract:
(Read more)
In several data-rich domains such as finance, medicine, law, and scientific publishing, most of the valuable information is embedded in unstructured textual formats, from clinical notes and legal briefs to financial statements and research papers. These sources are rarely available in structured formats suitable for immediate quantitative analysis. This presentation introduces a scalable and fully integrated workflow that employs large language models (LLMs), specifically ChatGPT 4.0 via API, in conjunction with Python and Stata to extract structured variables from unstructured documents and make them ready for further statistical processing in Stata.

As a representative use case, I demonstrate the extraction of information from a SOAP clinical note, treated as a typical example of unstructured medical documentation. The process begins with a single PDF and extends to an automated pipeline capable of batch-processing multiple documents, highlighting the scalability of this approach. The workflow involves PDF parsing and text preprocessing using Python, followed by prompt engineering designed to optimize the performance of the LLM. In particular, the temperature parameter is tuned to a low value (for example, 0.0–0.3) to promote deterministic and concise extraction, minimizing variation across similar documents and ensuring consistency in output structure.

Once the LLM returns structured data, typically in JSON or CSV format, it is seamlessly imported into Stata using custom .do scripts that handle parsing (insheet), transformation (split, reshape), and data cleaning. The final dataset is used for exploratory or inferential analysis, with visualization and summary statistics executed entirely within Stata. The presentation also addresses critical considerations including the computationala cost of using commercial LLM APIs (token-based billing), privacy and compliance risks when processing sensitive data (such as patient records), and the potential for bias or hallucination inherent to generative models. To assess the reliability of the extraction process, I report evaluation metrics such as cosine similarity (for text alignment and summarization accuracy) and F1-score (for evaluating named entity and numerical field extraction).

By bridging the capabilities of LLMs with Stata’s powerful analysis tools, this workflow equips researchers and analysts with an accessible method to unlock structured insights from complex unstructured sources, extending the reach of empirical research into previously inaccessible text-heavy datasets.

(Read less)

Loreta Isaraj
IRCrES-CNR

Text mining and hierarchical clustering in Stata: An applied approach for real-time policy monitoring, forecasting, and literature mapping. Abstract:
(Read more)
This presentation shows an applied framework for text mining and clustering in the Stata environment and provides practical tools for policy-relevant research in economics and health economics. With the growing amount of unstructured textual data—from financial news and analyst reports to scientific publications— there is an increasing demand for scalable methods to classify and interpret such information for evidence-based policy and forecasting.

A first relevant concept is the Stata capacity to be integrated with Python with aim to implement hierarchical clustering from scratch using TF-IDF vectorization and cosine distance. This technique is specifically applied to economic text sources—such as headlines or institutional communications—with the aim to segment documents into a fixed or silhouette-optimized number of clusters. This approach allows researchers to identify patterns on data, uncover latent themes, and organize information for macroeconomic forecasting, sentiment analysis, or real-time policy monitoring.

In the second part, I focus on literature mapping in health economics. Using a curated corpus of article titles related to telemedicine and diabetes, I apply a native Stata pipeline based on text normalization and clustering to identify thematic areas within the literature. The approach promotes organized reviews in health technology assessment and policy evaluation and makes evidence synthesis more accessible.

By combining native Stata capabilities with Python-enhanced workflows, I provide applied researchers with an accessible and policy-relevant toolkit for unsupervised text classification in multiple domains.

(Read less)

Carlo Drago
Università degli Studi Niccolò Cusano
5:40–6:00 Open panel discussion with Stata developers
Contribute to the Stata community by sharing your feedback with StataCorp's developers. From feature improvements to bug fixes and new ways to analyze data, we want to hear how Stata can be made better for our users.
8:00 Conference social dinner (Optional)

Workshop: Survey data analysis in Stata

Workshop information forthcoming


Scientific committee

Una-Louise Bell
TStat – TStat Training
Rino Bellocco
University of Milano-Bicocca
Giovanni Capelli
Istituto Superiore di Sanità
Giovanni Cerulli
IRCRES-CNR
Jan Ditzen
Libera Università di Bolzano
Maurizio Pisati
University of Milano-Bicocca

Registration and venue

Conference fees include breaks, lunch, and course materials.

Conference fees
(VAT not incl.)
Student Other
Conference only €70 €110
Conference + workshop €262 €420

Registration deadline is 15 September 2025.

Register online

Visit the official conference page for more information.

TStat is delighted to sponsor, via our project “Investing in Young Researchers”, two (2) full-time PhD students from any of the countries for which TStat is the official Stata distributor. Sponsorship covers both the first day of the conference and the workshop . Travel expenses are to be paid for the participant. To apply for sponsorship, please send your curriculum vitae to [email protected].


Logistics organizer

The logistics organizer for the 2025 Italian Stata Conference is TStat S.r.l., the distributor of Stata for Italy, Albania, Bosnia and Herzegovina, Greece, Kosovo, North Macedonia, Malta, Montenegro, Serbia, Slovakia, and Slovenia.

View the proceedings of previous Stata Conferences and Users Group meetings.