Home  /  Users Group meetings  /  2018 Italy

The Italian Stata Users Group Meeting was held on 15 November 2018 at the I Portici Hotel. There was also an optional course on 16 November. You can view the program and presentation slides below.


Session I: Invited Speaker

Multistate survival analysis in Stata
Abstract: Multistate models are increasingly being used to model complex disease profiles. By modeling transitions between disease states accounting for competing events at each transition we can gain a much richer understanding of patient trajectories and how risk factors impact over the entire disease pathway. In this presentation, I will introduce some new Stata commands for the analysis of multistate survival data. msset is a data preparation tool that converts a dataset from wide (one observation per subject, multiple time and status variables) to long (one observation for each transition for which a subject is at risk for). msaj calculates the nonparametric Aalen–Johansen estimates of transition probabilities. msboxes creates a descriptive plot of the multistate process through the transition matrix and numbers at risk. stms fits joint transition-specific survival models, which allow each transition to have a different parametric model. Yet each model is maximized jointly to enable sharing of covariate effects across transitions. predictms calculates a variety of predictions from a multistate survival model, including transition probabilities, length of stay (restricted mean time in each state), the probability of ever visiting each state, and more. Predictions are made at user-specified covariate patterns. One can calculate differences and ratios of predictions across covariate patterns. Standardized (population-averaged) predictions can be obtained. Confidence intervals for all quantities are available. One can use simulation or the Aalen–Johansen estimator to calculate all quantities. One can calculate user-defined predictions by providing a community-contributed Mata function to provide complete flexibility. predictms can be used with a general transition matrix (cyclic or acyclic) and allows the use of transition-specific timescales. I will illustrate the software using a dataset of patients with primary breast cancer.

Additional information:

Michael J. Crowther
University of Leicester
Session II: Community-contributed, I

The Stata module CUB for fitting mixture models for ordinal data
Abstract: We present CUB, a Stata module for modeling ordinal data via a class of finite mixture distributions accounting for both uncertainty and feeling components of an ordered decisional process. This routine allows for modeling also over dispersion, inflated categories, and large heterogeneity occurrences. Model parameters are estimated by maximum likelihood. We explore various features of the package CUB, including simulation routines.

Additional information:

Christopher F. Baum
Boston College
Giovanni Cerulli
Francesca di Iorio
Domenico Piccolo
Rosaria Simone
Università degli Studi di Napoli Federico II

Two-part models, hurdle models, and zero-inflated models are robust to endogeneity among the parts
Abstract: There are many models for an outcome that a mass point at a boundary value is continuously or discrete distributed over a large number of off-boundary values. Two-part models (TPMs), hurdle models (HMs), and zero-inflated models (ZIMs) use different approaches to combine distinct models for boundary and off-boundary values. Except for a few "cake debate" papers whose assertions were not accepted, the vast majority of the literature has either assumed that the process determining when the outcome is on or off the boundary must be exogenous or that any endogeneity must be modeled. Drukker (2017) showed that, contrary to conventional belief, TPMs are robust to endogeneity of the on-off boundary process in that they identify the mean of the outcome conditional on covariates.

In this presentation, I cover the following points:

  • I review this literature and the argument that shows that TPMs are robust to the endogeneity of on-off boundary process;
  • I discuss new results showing that HMs and ZIMs are also robust to the endogeneity of the on-off boundary process; and
  • I present a new Stata command that estimates the parameters of interest in endogenous TPMS and HMs and shows how to subsequently use margins to estimate the effects of interest.

Additional information:

David M. Drukker

Risk-adjustment procedures and graphical representations of outcome rates for institutional comparisons
Abstract: An overriding goal of outcomes research is measuring and comparing hospital performance using readily available administrative data. Risk-adjustment techniques develop from conventional logistic regression analysis, but some precautions must be taken into account for the positive correlation between observations from within the same hospital. These include the use of generalized estimating equations, which are available in Stata as xtgee. Two effective graphs that illustrate outcome measures across different providers and incorporate sample size information are the caterpillar plot and the funnel plot, which can be obtained using the eclplot and funnelcompar packages, respectively.

Additional information:

Jacopo Lenzi
Università degli Studi di Bologna
Session III: Exploiting the potential of Stata 15, I

Efficient dynamic documents using Stata
Abstract: Stata 15 includes three new commands for producing dynamic documents: dyndoc, putdocx, and putpdf. These commands have generated much interest in the user community; this has led to a large amount of community-contributed software. In this presentation, I'll give some tips about how to use the commands efficiently both with official Stata software and with some of these community-contributed tools.

Additional information:

Bill Rising
Session IV: Community-contributed, II

Recurrent-event analysis with Stata: Methods and applications
Abstract: In medical studies, the event of interest can often recur in the same patient over time. Even if time-to-first event analysis or the Poisson regression are still possible, they prevent the use of data to its full potential. If the outcome can occur more than once, failure times are correlated within subject and methods accounting for lack of independence are needed. Different statistical models are available for analyzing such data, including the Andersen and Gill (AG) model, the Prentice, Williams, and Peterson Total Time (PWP-TT) model, and frailty models. I will review statistical techniques for multiple-failure survival data and show how to implement them in Stata.

Additional information:

Francesca Ghilotti
Rino Bellocco
Karolinska Institutet, Università degli Studi di Milano Bicocca

The determinants of the referendum vote on 4 December 2016
Abstract: This presentation offers new evidence on the socioeconomic demographic determinants of the referendum of 4 December 2016 through the analysis of the vote in Italian municipalities. The results indicate a strong ideology of the vote in that the political orientation expressed in the 2014 elections significantly influences its orientation. Moreover, it is not so much the youth vote that has determined the rejection of the reform as the social unease, summarized through the unemployment rate and the share of commuters in the municipalities. This also involves a reduced explanatory capacity of the genuine territorial variables. Overall, the vote was determined mainly by political affiliation, then by the self-assessment of one's socio-economic status and, only residually, by personal opinion on the contents of the reform. If democracy is a result built up by the exercise of "electoral profession", then the revival of civic education, updated in the direction of enriching it with the economic implications of good institutions, can be an incentive to actually express preferences on the basic rules.

Additional information:

Mariano Bella
Giovanni Graziano
Ufficio Studi Confcommercio
Session V: Tricks and tips

Simple tools for saving time
Abstract: This brief talk will show some simple tools for saving time when working with Stata. This will be a hodgepodge of items whose goal is to reduce the amount of thought, coordination, and human memory required of common tasks in a complex work environment while speeding up such tasks greatly.

Additional information:

Bill Rising

Calling external routines in Stata
Abstract: One of the lesser known functions in Stata is the possibility to call external routines, written in other software, to perform specific tasks within Stata. I offer some insights on how to develop a Stata ado file embedding an external software routine to execute in Stata using the Stata command stree, written to allow users to run regression trees (a machine learning technique currently unavailable in Stata).

Additional information:

Giovanni Cerulli
Antonio Zinilli

Finding data embedded in text files: Using fileread() and basic string functions to extract spatial coordinates from Google Maps or counts in preformatted documents
Abstract: From Stata 13 on, Stata supports the long string format strL. One can use the programming function fileread() to upload an entire text or binary file in a Stata long string field, found in a local directory or uploaded from a webpage. These long string fields can then be searched to extract specific numeric or categorical data. I illustrate the use of the fileread() programming function, coupled with the string functions strpos() and substr(), to solve the following issues: i) extract spatial coordinates from a database of individual addresses using Google Map APIs calls; and ii) extract count data from a nonanonymous version of multiple Word structured files and automatically rebuild an anonymous PDF version of the file through LaTeX.

Additional information:

Giovanni Capelli
Università degli Studi di Cassino e del Lazio Meridionale
Session VI: Exploiting the potential of Stata 15, II

Estimating the average causal effect on an ordinal outcome of an endogenously assigned treatment from an endogenously selected sample
Abstract: I discuss the average causal effect (ACE) of an endogenous binary treatment on an ordinal outcome when the sample is subject to endogenous selection. I show how to estimate the ACE using an extended regression model (ERM) command in Stata. I illustrate how to do regression adjustment in Stata and discuss standard errors for sample-averaged treatment effects and population-averaged treatment effects.

Additional information:

David Drukker
Session VII

Wishes and grumbles
Abstract: Stata developers present will carefully and cautiously consider wishes and grumbles from Stata users in the audience. Questions, and possibly answers, may concern reports of present bugs and limitations or requests for new features in future releases of the software.
StataCorp personnel
8:15 Optional dinner at C'era Una Volta
(Via Massimo D'Azeglio, 9)

Course: Joint Modeling of Longitudinal and Survival Data

by Michael J. Crowther, Department of Health Sciences, University of Leicester


The joint modeling of longitudinal and survival data has been an area of growing interest in recent years, with the benefits of the approach becoming recognized in ever widening fields of study. The models can provide both an effective way of conducting an analysis of a survival endpoint (for example, time to death), influenced by a time-varying covariate measured with error, or alternatively correct for nonrandom dropout in the analysis of a longitudinal outcome (for example, a biomarker such as blood pressure). This one-day course will provide an introduction to joint modeling through real applications to both clinical trial data and electronic health records, using examples in cancer and liver cirrhosis. We will study the methodological framework, underlying assumptions, estimation, model building, and predictions. We will also consider current developments in the field, looking at some of the many extensions of the standard framework, such as the ability to model multiple biomarkers and competing risks. The course will consist of lectures and computing exercises making use of the stjm and merlin packages in Stata, written by the course lecturer.

  • Introductions
  • Lecture 1: Survival analysis, longitudinal analysis, and their combination
  • Lecture 2: Joint modeling of longitudinal and survival data
  • Lecture 3: Extended association structures and predictions
  • Lecture 4: Further topics in joint modeling

Target audience

This one-day workshop is of particular interest to biostatisticians, epidemiologists, applied statisticians, and researchers or professionals working in economics, the social sciences, or public health wishing to carry out survival analysis on longitudinal and panel data in their applied research. The maximum number of participants permitted will be restricted to 15.


Participants should be familiar with Stata. A working knowledge of survival analysis and an introductory knowledge of panel data is required.

Scientific committee

Una-Louise Bell
TStat S.r.l.

Rino Bellocco
Università degli Studi di Milano—Bicocca

Giovanni Capelli
Università degli Studi di Cassino

Maurizio Pisati
Università degli Studi di Milano—Bicocca

Logistics organizer

The logistics organizer for the 2018 Italian Stata Users Group meeting is TStat S.r.l., the distributor of Stata in Italy.

View the proceedings of previous Stata Users Group meetings.