The Italian Stata Users Group Meeting was held on 15 November 2018 at the I Portici Hotel. There was also an optional course on 16 November. You can view the program and presentation slides below.
Session I: Invited Speaker
Multistate survival analysis in Stata
Abstract: Multistate models are increasingly being used to model complex disease profiles. By modeling transitions between disease states accounting for competing events at each transition we can gain a much richer understanding of patient trajectories and how risk factors impact over the entire disease pathway. In this presentation, I will introduce some new Stata commands for the analysis of multistate survival data. msset is a data preparation tool that converts a dataset from wide (one observation per subject, multiple time and status variables) to long (one observation for each transition for which a subject is at risk for). msaj calculates the nonparametric Aalen–Johansen estimates of transition probabilities. msboxes creates a descriptive plot of the multistate process through the transition matrix and numbers at risk. stms fits joint transition-specific survival models, which allow each transition to have a different parametric model. Yet each model is maximized jointly to enable sharing of covariate effects across transitions. predictms calculates a variety of predictions from a multistate survival model, including transition probabilities, length of stay (restricted mean time in each state), the probability of ever visiting each state, and more. Predictions are made at user-specified covariate patterns. One can calculate differences and ratios of predictions across covariate patterns. Standardized (population-averaged) predictions can be obtained. Confidence intervals for all quantities are available. One can use simulation or the Aalen–Johansen estimator to calculate all quantities. One can calculate user-defined predictions by providing a community-contributed Mata function to provide complete flexibility. predictms can be used with a general transition matrix (cyclic or acyclic) and allows the use of transition-specific timescales. I will illustrate the software using a dataset of patients with primary breast cancer.
Michael J. Crowther
University of Leicester
Session II: Community-contributed, I
The Stata module CUB for fitting mixture models for ordinal data
Abstract: We present CUB, a Stata module for modeling ordinal data via a class of finite mixture distributions accounting for both uncertainty and feeling components of an ordered decisional process. This routine allows for modeling also over dispersion, inflated categories, and large heterogeneity occurrences. Model parameters are estimated by maximum likelihood. We explore various features of the package CUB, including simulation routines.
Christopher F. Baum
Francesca di Iorio
Università degli Studi di Napoli Federico II
Two-part models, hurdle models, and zero-inflated models are robust to endogeneity among the parts
Abstract: There are many models for an outcome that a mass point at a boundary value is continuously or discrete distributed over a large number of off-boundary values. Two-part models (TPMs), hurdle models (HMs), and zero-inflated models (ZIMs) use different approaches to combine distinct models for boundary and off-boundary values. Except for a few "cake debate" papers whose assertions were not accepted, the vast majority of the literature has either assumed that the process determining when the outcome is on or off the boundary must be exogenous or that any endogeneity must be modeled. Drukker (2017) showed that, contrary to conventional belief, TPMs are robust to endogeneity of the on-off boundary process in that they identify the mean of the outcome conditional on covariates.
In this presentation, I cover the following points:
David M. Drukker
Risk-adjustment procedures and graphical representations of outcome rates for institutional comparisons
Abstract: An overriding goal of outcomes research is measuring and comparing hospital performance using readily available administrative data. Risk-adjustment techniques develop from conventional logistic regression analysis, but some precautions must be taken into account for the positive correlation between observations from within the same hospital. These include the use of generalized estimating equations, which are available in Stata as xtgee. Two effective graphs that illustrate outcome measures across different providers and incorporate sample size information are the caterpillar plot and the funnel plot, which can be obtained using the eclplot and funnelcompar packages, respectively.
Università degli Studi di Bologna
Session III: Exploiting the potential of Stata 15, I
Efficient dynamic documents using Stata
Abstract: Stata 15 includes three new commands for producing dynamic documents: dyndoc, putdocx, and putpdf. These commands have generated much interest in the user community; this has led to a large amount of community-contributed software. In this presentation, I'll give some tips about how to use the commands efficiently both with official Stata software and with some of these community-contributed tools.
Session IV: Community-contributed, II
Recurrent-event analysis with Stata: Methods and applications
Abstract: In medical studies, the event of interest can often recur in the same patient over time. Even if time-to-first event analysis or the Poisson regression are still possible, they prevent the use of data to its full potential. If the outcome can occur more than once, failure times are correlated within subject and methods accounting for lack of independence are needed. Different statistical models are available for analyzing such data, including the Andersen and Gill (AG) model, the Prentice, Williams, and Peterson Total Time (PWP-TT) model, and frailty models. I will review statistical techniques for multiple-failure survival data and show how to implement them in Stata.
Karolinska Institutet, Università degli Studi di Milano Bicocca
The determinants of the referendum vote on 4 December 2016
Abstract: This presentation offers new evidence on the socioeconomic demographic determinants of the referendum of 4 December 2016 through the analysis of the vote in Italian municipalities. The results indicate a strong ideology of the vote in that the political orientation expressed in the 2014 elections significantly influences its orientation. Moreover, it is not so much the youth vote that has determined the rejection of the reform as the social unease, summarized through the unemployment rate and the share of commuters in the municipalities. This also involves a reduced explanatory capacity of the genuine territorial variables. Overall, the vote was determined mainly by political affiliation, then by the self-assessment of one's socio-economic status and, only residually, by personal opinion on the contents of the reform. If democracy is a result built up by the exercise of "electoral profession", then the revival of civic education, updated in the direction of enriching it with the economic implications of good institutions, can be an incentive to actually express preferences on the basic rules.
Ufficio Studi Confcommercio
Session V: Tricks and tips
Simple tools for saving time
Abstract: This brief talk will show some simple tools for saving time when working with Stata. This will be a hodgepodge of items whose goal is to reduce the amount of thought, coordination, and human memory required of common tasks in a complex work environment while speeding up such tasks greatly.
Calling external routines in Stata
Abstract: One of the lesser known functions in Stata is the possibility to call external routines, written in other software, to perform specific tasks within Stata. I offer some insights on how to develop a Stata ado file embedding an external software routine to execute in Stata using the Stata command stree, written to allow users to run regression trees (a machine learning technique currently unavailable in Stata).
Finding data embedded in text files: Using fileread() and basic string functions to extract spatial coordinates from Google Maps or counts in preformatted documents
Abstract: From Stata 13 on, Stata supports the long string format strL. One can use the programming function fileread() to upload an entire text or binary file in a Stata long string field, found in a local directory or uploaded from a webpage. These long string fields can then be searched to extract specific numeric or categorical data. I illustrate the use of the fileread() programming function, coupled with the string functions strpos() and substr(), to solve the following issues: i) extract spatial coordinates from a database of individual addresses using Google Map APIs calls; and ii) extract count data from a nonanonymous version of multiple Word structured files and automatically rebuild an anonymous PDF version of the file through LaTeX.
Università degli Studi di Cassino e del Lazio Meridionale
Session VI: Exploiting the potential of Stata 15, II
Estimating the average causal effect on an ordinal outcome of an endogenously assigned treatment from an endogenously selected sample
Abstract: I discuss the average causal effect (ACE) of an endogenous binary treatment on an ordinal outcome when the sample is subject to endogenous selection. I show how to estimate the ACE using an extended regression model (ERM) command in Stata. I illustrate how to do regression adjustment in Stata and discuss standard errors for sample-averaged treatment effects and population-averaged treatment effects.
Wishes and grumbles
Abstract: Stata developers present will carefully and cautiously consider wishes and grumbles from Stata users in the audience. Questions, and possibly answers, may concern reports of present bugs and limitations or requests for new features in future releases of the software.
Optional dinner at C'era Una Volta
(Via Massimo D'Azeglio, 9)