Home  /  Stata Conferences  /  2022 Italy


Session I: Exploiting the potential of Stata 17, I

Custom estimation tables Abstract: This presentation illustrates how to construct custom tables from one or more estimation commands.
(Read more)
I demonstrate how to add custom labels for significant coefficients and make targeted style edits to cells in the table using the following commands:
  • collect get
  • collect dir
  • collect dims
  • collect levelsof
  • collect label list
  • collect label values
  • collect layout
  • collect query header
  • collect style header
  • collect style showbase
  • collect style row
  • collect style cell
  • collect query column
  • collect style column
  • collect style stars
  • collect query column
  • collect preview
  • etable
I begin with a description of what constitutes a collection and how items (numeric and string results) in a collection are tagged (identified) and conclude with a simple workflow to enable users to build their own custom tables from estimation commands. This presentation motivates the construction of estimation tables and concludes with the convenience command etable.

(Read less)

Additional information:

Jeff Pitblado
Session II: Community-contributed, I

Machine learning using Stata/Python Abstract: Two related Stata modules, r_ml_stata and c_ml_stata, are presented for fitting popular machine learning (ML) methods both in regression and classification settings.
(Read more)
Using the recent Stata/Python integration platform(), introduced in Stata 16, these commands provide hyper-parameters’ optimal tuning via K-fold cross-validation using grid search. More specifically, they make use of the Python scikit-learn API to carry out both cross-validation and outcome/label prediction.

(Read less)

Additional information:

Giovanni Cerulli
IRcRES, Rome

A Stata routine for estimating the blocking with regression adjustment Abstract: The psreg command implements the blocking with regression adjustment estimator, proposed by Imbens (Journal of Human Resources 2015).
(Read more)
It relies on the estimate of the propensity score and uses regressions in subclasses (blocks) of the propensity score. The ATT is given by estimates within-block averaged for the number of treated units in each block. In the case of ATE, the estimates are averaged for the number of units (treated and untreated) in each block.

(Read less)

Additional information:

Martina Bazzoli
Session III: Community-contributed, II

A Stata package for cluster-weighted modeling Abstract: The cluster-weighted model (CWM) is a member of the family of the mixtures of regression models, and is also referred to in the literature as the mixture of regression with random covariates.
(Read more)
These models extend finite mixture models by allowing the researcher to model the marginal distribution of regression covariates along with the conditional distribution. The attention on CWMs is increasing; indeed, software for estimating these kinds of models is available to R users but not for Stata users. Thus, the aim of this presentation is to introduce the Stata package cwmglm. This package extends the capabilities of fmm by introducing more advanced mixture models based on maximum likelihood estimation and the expectation maximization (EM) algorithm.

cwmglm allows users to fit CWMs based on the most common generalized linear models (GLM) with random covariates. The supported GLM families are Gaussian, Poisson and binomial, while the allowed marginal distributions for the covariates are multivariate normal, multinomial, binomial, and Poisson. cwmglm extends the current capabilities in the estimation of CWMs by allowing users to evaluate model fit by introducing the generalized determination coefficients and by incorporating bootstrap-based inference. These features are not available in the current version of the R-package software for CWMs.

Furthermore, cwmglm allows one to estimate parsimonious models of Gaussian distributions. This approach is based on assuming the correlation structure between concomitants within multivariate Gaussian mixture components and on the equality/inequality of variance–covariance matrices between components. Fourteen parsimonious models are possible by exploiting the eigenvalue decomposition of the variance–covariance matrix. Parsimonious mixtures of multivariate Gaussian distributions can be used to model random covariates within CWM-GLM or as stand-alone models (mixture of multivariate Gaussians with defined covariance matrix). This feature is completely new for Stata users because it is not allowed by gsem and fmm. Last, the flexibility of cwmglm allows one to estimate the “canonical” finite mixture of regressions.

(Read less)

Additional information:

Daniele Spinelli
University of Milan–Bicocca

Stacking generalization and machine learning in Stata Abstract: pystacked implements stacked generalization (Wolpert 1992) for regression and binary classification via Python’s scikit-learn.
(Read more)
Stacking combines multiple supervised machine learners—the “base” or “level-0” learners—into a single learner. The currently supported base learners include regularized regression, random forest, gradient boosting, support vector machines and feed-forward neural nets (multilayer perceptron). pystacked can also be used with a “regular” machine-learning program to fit a single base learner and thus provides an easy-to-use API for scikit-learn’s machine-learning algorithms.

(Read less)

Additional information:

Achim Ahrens
ETH Zürich

Double/debiased machine learning in Stata Abstract: ddml implements algorithms for causal inference aided by supervised machine learning as proposed in "Double/ debiased machine learning for treatment and structural parameters" (Econometrics Journal 2018).
(Read more)
Five different models are supported, allowing for binary or continuous treatment variables and endogeneity. ddml supports a variety of different ML programs, including lassopack and pystacked.

(Read less)

Additional information:

Achim Ahrens
ETH Zürich
Session IV: Exploiting the potential of Stata 17, II

Treatment-effects estimation using lasso Abstract: One can use treatment-effects estimators to draw causal inferences from observational data.
(Read more)
You can use lasso when you want to control for many potential covariates. With standard treatment-effects models, there is an intrinsic conflict between two required assumptions. The conditional independence assumption is likely to be satisfied with many variables in the model, while the overlap assumption is likely to be satisfied with fewer variables in the model. This presentation shows how to overcome this conflict by using Stata 17’s telasso command.

telasso estimates the average treatment effects with high-dimensional controls while using lasso for model selection. This estimator is robust to the model-selection mistakes. Moreover, it is doubly robust, so only one of the outcome or treatment model needs to be correctly specified.

(Read less)

Additional information:

Di Liu
Session V: Community-contributed, III

rbiprobit: Recursive bivariate probit estimation and decomposition of marginal effects Abstract: This presentation describes a new Stata command, rbiprobit, for fitting recursive bivariate probit models, which differ from bivariate probit models in allowing the first dependent variable to appear on the right-hand side of the second dependent variable.
(Read more)
Although the estimation of model parameters does not differ from the bivariate case, the existing commands biprobit and cmp do not consider the structural model’s recursive nature for postestimation commands. rbiprobit estimates the model parameters, computes treatment effects of the first dependent variable, and gives the marginal effects of independent variables. In addition, marginal effects can be decomposed into direct and indirect effects if covariates appear in both equations. Moreover, the postestimation commands incorporate the two community-contributed goodness-of-fit tests scoregof and bphltest. Dependent variables of the recursive probit model may be binary, ordinal, or a mixture of both. I present and explain the rbiprobit command and the available postestimation commands using data from the European Social Survey.

(Read less)

Additional information:

Mustafa Coban
Institute for Employment Research

A Stata package to handle metadata Abstract: In this presentation, I offer a brief tour of mdata, a Stata community-contrubuted package that provides a set of tools to help users handle metadata in large and complex datasets.
(Read more)
The package uses an Excel file to store all metadata related to a dataset. This is particularly useful to edit and modify metadata outside of Stata, and also to deal with datasets stored in non-Stata format. The presentation will focus on the most important features of the package, namely on how to extract metadata from data in memory, perform consistency checks on the metadata, apply metadata to data in memory, and compare and combine metadata from two datasets.

(Read less)

Additional information:
Italy22_Iglésias (https:)

Gustavo Iglésias
Microdata Research Laboratory, Banco de Portugal
4:15–4:45 nwxtregress: Network regressions in Stata Abstract: In this presentation, I introduce nwxtregress, a new community-contributed routine to estimate network regressions.
(Read more)
It uses MCMC estimation methods (LeSage and Pace 2009) to produce estimates of endogenous peer effects, as well as own-node (direct) and cross-node (indirect) partial effects, where nodes correspond to cross-sectional units of observation. nwxtregress is designed to handle unbalanced panels of economic and social networks as in Grieser et al. (2021). Networks can be directed or undirected with weighted or unweighted edges, and they can be imported in a list format that does not require a shapefile or a Stata spatial weight matrix set by spmatrix. Finally, the command allows for the inclusion or exclusion of contextual effects. To improve speed, the command transforms the spatial weighting matrix into a sparse matrix. Future work will be targeted toward improving sparse matrix routines, as well as introducing a framework that allows for multiple networks.

(Read less)

Additional information:

Jan Ditzen
Free University of Bozen-Bolzano
Session VI: Application study using Stata

Modeling the risk of multimorbidity: An application of multistate models to the Swedish National March Cohort Abstract: Chronic diseases, defined as health problems requiring ongoing management over a period of years or decades, currently represent the predominant burden of healthcare. To address the coexistence of two or more diseases or conditions, I use the term "multimorbidity".
(Read more)
When combined, chronic diseases create additional challenges to patient care because clinical trials usually exclude patients with coexisting conditions; therefore, most guidelines do not provide recommendations for patients presenting with multiple diseases. With worldwide life expectancy increasing from 45.7 years in 1950 to 72.6 years in 2019 and 20% of people aged ≥ 65 years in Europe in the same year, understanding the patterns and risk factors of multimorbidity has become of great relevance for public health. Multistate models are a well-suited statistical framework to address this problem.

(Read less)

Additional information:

Giulia Peveri
University of Milan

Net Promoter Score–Beyond the measure: A statistical approach based on generalized ordered logit models implemented by Stata to conduct an NPS key drivers’ analysis Abstract: The Net Promoter Score (NPS) index is a popular satisfaction measure that allows one to gauge customer loyalty (CL) at most large and medium-sized firms in different fields.
(Read more)
Because of its impact on a company’s growth, line managers are strongly interested in knowing which factors can increase NPS by increasing promoters and decreasing detractors. NPS key drivers’ analysis (NPS KDA) can be a suitable tool for this task. A KDA may be conducted by implementing different statistical approaches for identifying those factors or drivers with a significant impact on a specific outcome variable. In the context of NPS KDA, the regression models for ordinal outcomes represent a statistical approach for identifying those significant customer experience (CX) attributes that can drive customer status (CS) from detractors to promoters, leading companies to design appropriate improvement strategies, involving those facets of product or service with the highest improvement priority.

In this presentation, the NPS KDA has been conducted by implementing in Stata two special cases of the generalized ordered logit models, the proportional odds model (POM) and the partial proportional odds model (PPOM), where the dependent variable CS was modeled as a function of different CX attributes.

(Read less)

Additional information:

Debora Giovannelli

Absences from work and climate change: An empirical analysis Abstract: The research aims to observe the Italian regions with most absences from work and verify if there is a relationship between the absences and climate change.
(Read more)
I used the INPS database relating to employees; the time interval considered was 2009–2018, and the variable credit difference was examined, which is a measure of the salary that workers have not received because of absence from work.

Then, the existence of geographical influence between Italian regions was verified through the creation of maps using Stata. From which other variables available, a new variable was created, measures the number of absences of workers for each region. The creation of maps enabled us to see the Italian regions where workers are more absent. Looking only at the sectors most affected by climate change, we see the results vary.

Finally, only absences due to sickness and injury were observed, because they could be caused by climate change and extreme weather events. By observing the outliner values of the variable that measures absences from work, we found that extreme weather events actually occurred in the month and in the region in which the value far from the average was recorded.

(Read less)

Additional information:

Grazia Errichiello
Università degli Studi di Napoli Parthenope
6:00–6:15 Open panel discussion with Stata developers
Contribute to the Stata community by sharing your feedback with StataCorp's developers. From feature improvements to bug fixes and new ways to analyze data, we want to hear how Stata can be made better for our users.

Workshop: Maximizing the potential of Stata’s new Python capabilities

by Giovanni Cerulli, IRcRES, Rome

Date & time

20 May from 9:00 a.m. to 4:30 p.m.


Python integration is one of the most interesting features recently incorporated into Stata, because it allows users to use the wide range of Python packages (opensource) to process, visualize, and explore data within the Stata environment or to incorporate Python codes directly in the do-files of Stata.

This workshop offers participants an excellent opportunity to acquire the programming skills necessary for integrating Python's capability into Stata 17 through a series of examples that allow you to highlight when, and why, you should take advantage of the connectivity between Python and Stata for your own research.

The goal is to offer an overview of the applicability of the Python programming language within Stata.


Operational knowledge of Stata. Knowledge of Python is not required, although it will be an advantage.

Scientific committee

Una-Louise Bell
TStat – TStat Training
Rino Bellocco
University of Milano-Bicocca
Giovanni Capelli
University of Cassino and Southern Lazio
Maurizio Pisati
University of Milano-Bicocca

Logistics organizer

The logistics organizer for the 2024 Italian Stata Conference is TStat S.r.l., the distributor of Stata for Italy, Albania, Bosnia and Herzegovina, Greece, Kosovo, North Macedonia, Malta, Montenegro, Serbia, Slovakia, and Slovenia.

View the proceedings of previous Stata Conferences and Users Group meetings.