2025 Italian Stata Conference

Home / Stata Conferences / 2025 Italy

Proceedings

9:00–10:25	Session I: Exploiting the potential of Stata 19, I Linking frames in Stata Abstract: (Read more) This presentation gives an overview of data frames in Stata. I demonstrate the basics of working with multiple datasets in Stata. I cover most of the frames suite of commands, touching on frame creation and management, linking frames, copying variables from linked frames, alias variables, and working with a set of frames. (Read less) Additional information: Italy25_Pitblado.pdf Jeff Pitblado StataCorp Estimating IATEs in Stata using cate Abstract: (Read more) This presentation offers a concise overview of the cate command, a new tool introduced in Stata 19 for estimating conditional average treatment effects (CATEs). CATEs quantify how the impact of a treatment varies across individuals or subgroups defined by observed characteristics, thus enabling a more nuanced understanding of treatment-effect heterogeneity and supporting the design of targeted policy interventions. (Read less) Additional information: Italy25_Cerulli.pdf Giovanni Cerulli IRCrES-CNR
10:45–12:15	Session II: Community-contributed commands, I xtbreak: Testing for structural breaks in Stata Abstract: (Read more) Identifying structural change is a crucial step in analysis of time series and panel data. The longer the time span, the higher the likelihood that the model parameters have changed as a result of major disruptive events, such as the 2007–2008 financial crisis and the 2020 COVID-19 outbreak. Detecting the existence of breaks and dating them is therefore necessary not only for estimation purposes but also for understanding drivers of change and their effect on relationships. This talk introduces a new community-contributed command called xtbreak, which provides researchers with a complete toolbox for analyzing multiple structural breaks in time-series and panel data. xtbreak can detect the existence of breaks, determine their number and location, and provide break date confidence intervals. A special emphasis of the talk will be put on Python integration to gain speed advantages. (Read less) Additional information: Italy25_Ditzen_xtbreak.pdf Jan Ditzen Libera Università di Bolzano Variance components in panel data Abstract: (Read more) A preliminary and crucial step in any empirical research on panel data, whether longitudinal, time-series cross-section, or multilevel, is to study the nature and relevance of the components that influence the variability of the variables, particularly the dependent variable. Each panel dataset can be considered as a set of grouped data, whether these are temporal observations nested within individuals or individuals nested within groups and supergroups. The fundamental steps for guiding the modeling strategies to be adopted are as follows: breaking down the total variability into variances between and within clusters, also in terms of percentage shares; assessing whether there are relevant common factors within clusters and, in the case of temporal observations, whether these are stationary or not; and comparing the relevance and significance of group and individual effects depending on whether they are considered fixed or random. (Read less) Additional information: Italy25_Bontempi.pdf Maria Elena Bontempi Università di Bologna
12:15–1:00	Session III: Stata tips and tricks xtplot2 Abstract: (Read more) The xtplot2 command investigates the structure of panel datasets with respect to unbalancedness and values using heat plots. It allows the researcher a quick and efficient way to gain insights into the structure. (Read less) Additional information: Italy25_Ditzen_xplot2.pdf Jan Ditzen Libera Università di Bolzano Automating episode splitting: Introducing the splitting command for Stata Abstract: (Read more) Event history analysis (also known as survival analysis) is a well-established analytical tool in the social sciences and research more broadly, and it is particularly useful when researchers aim to estimate the effect of time-varying variables. Survival analysis is well supported in Stata via numerous built-in commands. In particular, stsplit facilitates breaking the time axis into episodes to include time-varying covariates in the analysis. While stsplit is straightforward to use when the time axis must be split at the point a change occurs in a dichotomous variable, the procedure becomes less intuitive when dealing with polytomous variables. (Read less) Additional information: Italy25_Bussi.pdf Davide Bussi Università degli Studi di Milano-Bicocca xtgetpca Abstract: (Read more) Extracting principal components in panel data is common. However no Stata solution exists. xtgetpca fills this gap. It allows for different types of standardization, removal of fixed effects, and unbalanced panels. (Read less) Additional information: Italy25_Ditzen_xtgetpca.pdf Jan Ditzen Libera Università di Bolzano
2:00–3:20	Session IV: Exploiting the potential of Stata 19, II Meta-analysis in Stata Abstract: (Read more) Many studies attempt to answer similar research questions. For instance, you may have results from studies asking, “What is the association between unemployment and mental health?” Or you may have results from studies asking, “How does motherhood affect women’s wages?” The results from different studies may be inconclusive or conflicting. Meta-analysis is a statistical technique for combining the results from several similar studies. It allows us to explore the variation across studies and, when appropriate, provide a single estimate for the effect size of interest. In this presentation, I show how to use the meta suite of commands to perform meta-analysis in Stata. (Read less) Additional information: Italy25_Ortiz.pdf Gabriela Ortiz StataCorp Consensus clustering in Stata Abstract: (Read more) This work considers consensus clustering in Stata, combining bootstrapped k-means with hierarchical clustering based on a coassociation matrix. The method addresses the possible inherent instability of partitioning-based clustering by aggregating results from multiple bootstrap samples, improving robustness and reproducibility. In this respect, at each iteration, k-means clustering is applied, and the results are collected in a large-scale cluster assignment matrix. A consensus matrix is then created to measure the cooccurrence of observations within the same cluster across all iterations. This matrix is transformed into a dissimilarity structure and in this way subjected to hierarchical clustering in order to obtain a final, stable partition. This framework shows how consensus clustering can be performed robustly and efficiently in Stata. It uses a combination of Stata routines, bootstrap sampling, and optimized Mata routines to compute the co-association matrix, ensuring computational efficiency. The approach is broadly applicable to clustering tasks in the social sciences, economics, epidemiology, and other fields where cluster stability is critical. (Read less) Additional information: Italy25_Drago1.pdf Carlo Drago Università degli Studi Niccolò Cusano
3:35–4:35	Session V: Community-contributed commands, II outdetect: Outlier detection for inequality and poverty analysis Abstract: (Read more) Extreme values are common in survey data and represent a recurring threat to the reliability of both poverty and inequality estimates. The adoption of a consistent criterion for outlier detection is useful in many practical applications, particularly when international and intertemporal comparisons are involved. In this talk, I discuss a simple univariate detection procedure to flag outliers. I present outdetect, a command that implements the procedure and provides useful diagnostic tools. The output of outdetect compares statistics obtained before and after the exclusion of outliers, with a focus on inequality and poverty measures. Finally, I carry out an extensive sensitivity exercise where the same outlier detection method is applied consistently to per capita expenditure across more than 30 household budget surveys. The results are clear and provide a sense of the influence of extreme values on poverty and inequality estimates. (Read less) Additional information: Italy25_Mancini.pdf Giulia Mancini Università degli Studi di Sassari rdlasso: A Stata command for high-dimensional regression discontinuity designs Abstract: (Read more) The rdlasso command implements regression discontinuity designs (RDD) with high-dimensional covariates in Stata. The procedure is based on the methodology developed by Kreiss and Rothe (2023), and extends it to both sharp and fuzzy designs. Covariate selection is performed through a lasso-based local estimation, ensuring valid inference under approximate sparsity. The command is built using Stata’s Python integration via the SFI module and automates all steps of the estimation process—from covariate selection to bandwidth choice and bias-corrected treatment-effect estimation. The syntax allows for flexible user control while remaining fully embedded in the Stata environment. rdlasso enables Stata users to apply machine learning techniques for causal inference without requiring programming in external platforms such as R or Python. The command generates output variables that can be used for further postestimation analysis within the same session. An option automatically distinguishes between sharp and fuzzy designs, making the tool both user-friendly and methodologically complete. The implementation is illustrated through a step-by-step example and an empirical application. The command contributes to the growing set of tools for modern causal analysis in Stata, particularly in high-dimensional settings. (Read less) Additional information: Italy25_Nitti.pdf Marianna Nitt Sapienza – Università di Roma
4:35–5:40	Session VI: Exploiting the potential of Stata 19, III Automated data extraction from unstructured text using LLMs: A scalable workflow for Stata users Abstract: (Read more) In several data-rich domains such as finance, medicine, law, and scientific publishing, most of the valuable information is embedded in unstructured textual formats, from clinical notes and legal briefs to financial statements and research papers. These sources are rarely available in structured formats suitable for immediate quantitative analysis. This presentation introduces a scalable and fully integrated workflow that employs large language models (LLMs), specifically ChatGPT 4.0 via API, in conjunction with Python and Stata to extract structured variables from unstructured documents and make them ready for further statistical processing in Stata. As a representative use case, I demonstrate the extraction of information from a SOAP clinical note, treated as a typical example of unstructured medical documentation. The process begins with a single PDF and extends to an automated pipeline capable of batch-processing multiple documents, highlighting the scalability of this approach. The workflow involves PDF parsing and text preprocessing using Python, followed by prompt engineering designed to optimize the performance of the LLM. In particular, the temperature parameter is tuned to a low value (for example, 0.0–0.3) to promote deterministic and concise extraction, minimizing variation across similar documents and ensuring consistency in output structure. Once the LLM returns structured data, typically in JSON or CSV format, it is seamlessly imported into Stata using custom .do scripts that handle parsing (insheet), transformation (split, reshape), and data cleaning. The final dataset is used for exploratory or inferential analysis, with visualization and summary statistics executed entirely within Stata. The presentation also addresses critical considerations including the computationala cost of using commercial LLM APIs (token-based billing), privacy and compliance risks when processing sensitive data (such as patient records), and the potential for bias or hallucination inherent to generative models. To assess the reliability of the extraction process, I report evaluation metrics such as cosine similarity (for text alignment and summarization accuracy) and F1-score (for evaluating named entity and numerical field extraction). By bridging the capabilities of LLMs with Stata’s powerful analysis tools, this workflow equips researchers and analysts with an accessible method to unlock structured insights from complex unstructured sources, extending the reach of empirical research into previously inaccessible text-heavy datasets. (Read less) Additional information: Italy25_Isaraj.pdf Loreta Isaraj IRCrES-CNR Text mining and hierarchical clustering in Stata: An applied approach for real-time policy monitoring, forecasting, and literature mapping. Abstract: (Read more) This presentation shows an applied framework for text mining and clustering in the Stata environment and provides practical tools for policy-relevant research in economics and health economics. With the growing amount of unstructured textual data—from financial news and analyst reports to scientific publications— there is an increasing demand for scalable methods to classify and interpret such information for evidence-based policy and forecasting. A first relevant concept is the Stata capacity to be integrated with Python with aim to implement hierarchical clustering from scratch using TF-IDF vectorization and cosine distance. This technique is specifically applied to economic text sources—such as headlines or institutional communications—with the aim to segment documents into a fixed or silhouette-optimized number of clusters. This approach allows researchers to identify patterns on data, uncover latent themes, and organize information for macroeconomic forecasting, sentiment analysis, or real-time policy monitoring. In the second part, I focus on literature mapping in health economics. Using a curated corpus of article titles related to telemedicine and diabetes, I apply a native Stata pipeline based on text normalization and clustering to identify thematic areas within the literature. The approach promotes organized reviews in health technology assessment and policy evaluation and makes evidence synthesis more accessible. By combining native Stata capabilities with Python-enhanced workflows, I provide applied researchers with an accessible and policy-relevant toolkit for unsupervised text classification in multiple domains. (Read less) Additional information: Italy25_Drago2.pdf Carlo Drago Università degli Studi Niccolò Cusano
5:40–6:00	Open panel discussion with Stata developers Contribute to the Stata community by sharing your feedback with StataCorp's developers. From feature improvements to bug fixes and new ways to analyze data, we want to hear how Stata can be made better for our users.

Scientific committee

Una-Louise Bell
TStat – TStat Training

Rino Bellocco
University of Milano-Bicocca

Giovanni Capelli
Istituto Superiore di Sanità

Giovanni Cerulli
IRCRES-CNR

Jan Ditzen
Libera Università di Bolzano

Maurizio Pisati
University of Milano-Bicocca

Logistics organizer

The logistics organizer for the 2025 Italian Stata Conference is TStat S.r.l., the distributor of Stata for Italy, Albania, Bosnia and Herzegovina, Greece, Kosovo, North Macedonia, Malta, Montenegro, Serbia, Slovakia, and Slovenia.

View the proceedings of previous Stata Conferences and Users Group meetings.

Products

New in Stata 19

Why Stata

All features

Disciplines

Stata/MP

StataNow

Order Stata

Purchase

Order Stata

Bookstore

Stata Press

Stata Journal

Gift Shop

Learn

Free webinars

NetCourses

Classroom and web training

Organizational training

Video tutorials

Third-party courses

Web resources

Teaching with Stata

Support

Training

Video tutorials

FAQs

Statalist: The Stata Forum

Resources

Technical support

Customer service

Alerts

Company

News and events

Customer service

Careers

We use cookies

We use cookies to ensure that we give you the best experience on our website—to enhance site navigation, to analyze usage, and to assist in our marketing efforts. By continuing to use our site, you consent to the storing of cookies on your device and agree to delivery of content, including web fonts and JavaScript, from third party web services.

Cookie Settings

Privacy policy

Last updated: 16 November 2022

StataCorp LLC (StataCorp) strives to provide our users with exceptional products and services. To do so, we must collect personal information from you. This information is necessary to conduct business with our existing and potential customers. We collect and use this information only where we may legally do so. This policy explains what personal information we collect, how we use it, and what rights you have to that information.

Required cookies

Advertising cookies

Required cookies

These cookies are essential for our website to function and do not store any personally identifiable information. These cookies cannot be disabled.
Advertising and performance cookies

This website uses cookies to provide you with a better user experience. A cookie is a small piece of data our website stores on a site visitor's hard drive and accesses each time you visit so we can improve your access to our site, better understand how you use our site, and serve you content that may be of interest to you. For instance, we store a cookie when you log in to our shopping cart so that we can maintain your shopping cart should you not complete checkout. These cookies do not directly store your personal information, but they do support the ability to uniquely identify your internet browser and device.

Please note: Clearing your browser cookies at any time will undo preferences saved here. The option selected here will apply only to the device you are currently using.

Accept Cookies

Stata/MP4 Annual License (download)

2025 Italian Stata Conference 25 September | Milan

Proceedings

Session I: Exploiting the potential of Stata 19, I

Session II: Community-contributed commands, I

Session III: Stata tips and tricks

Session IV: Exploiting the potential of Stata 19, II

Session V: Community-contributed commands, II

Session VI: Exploiting the potential of Stata 19, III

Scientific committee

Logistics organizer

We use cookies

Privacy policy

Required cookies

Advertising and performance cookies

2025 Italian Stata Conference
25 September | Milan