Home  /  Resources & support  /  Users Group meetings  /  2008 Italian Stata Users Group meeting

Last updated: 13 January 2009

2008 Italian Stata Users Group meeting

Monday, 20 October 2008

Duomo, Milan

Jolly Machiavelli Hotel
Via Lazzaretto 5
Milan, Italy


Confidence bands for the survival function

Enzo Coviello
Azienda USL BA/1
In several medical reports, the survival function is graphed along with the confidence intervals. The endpoints of the confidence intervals are usually connected to draw an area where the entire survival curve is contained with a given confidence.

Confidence intervals are pointwise, i.e., they refer to the survival probability at a single time, but they are not valid for all the estimates of the entire survival curve. To this aim, the appropriate measure is confidence bands, not yet available within Stata.

Two methods are usually employed to construct these confidence bands. The first was proposed by Hall and Wellner (1980), and the second was proposed by Nair (1984). The latter produces the so-called equal precision (EP) confidence bands. For both methods, log-minus-log and arcsine square-root transformed versions have been proposed.

stcband is a new Stata command that allows the user to graph the survival function, together with the confidence bands constructed according to the Hall–Wellner and EP methodologies. The available options allow the user to a) specify the lower and upper limits of the time where the bands are to be estimated; b) choose the linear, log, or arcsine transformation; c) set the confidence level at 90, 95, or 99%; d) save the estimates; and e) manage the aspect of the graph. A further option allows the user to estimate the confidence bands of the cumulative hazard function.

Using an example, I illustrate
  • the results obtained by using stcband and the corresponding R function;
  • the use of the command and the differences of the estimates of its confidence bands versus the usual pointwise confidence intervals.
Finally, I will compare the coverage probabilities of the confidence bands estimated according to the above mentioned approaches by using simulated data with various survival distributions. I will perform simulations using Stata’s bootstrap command.

The new command also has an accompanying help file in which the user is able to run an example, taken from the second edition of Klein and Moeschberger’s Survival Analysis Techniques for Censored and Truncated Data (p. 109–117), by clicking on the Viewer window. stcband is available for download from the SSC Archives.

I am grateful to Maarten Buis for his advice in constructing the simulations comparing the coverage probabilities of the confidence bands.

Additional information

New wine in new bottles: Visualizing the progression over time of the epidemics of tobacco smoking and obesity through the use of modified population pyramids

Giovanni Capelli
Università di Cassino
Bruno Federico
Università di Cassino
Giuseppe Costa
Università di Torino
Tobacco smoking and obesity greatly contribute to premature death and disease in developed countries. In order to measure the extent to which these risk factors affect a population, as well as to describe the progression of these epidemics over time, routine surveillance of the prevalence of obesity and smoking is carried out by international organizations, national departments of health, and statistical offices. To this end, summary measures—like age-standardized rates, and tabular and graphical representations, such as maps—are used.

In this study, we argue that population pyramids, a widely used demographic tool, may be easily adapted to provide relevant visual information for public health purposes. By means of two juxtaposed histograms, one for each gender, population pyramids show either the proportion or the actual number of subjects in each age and gender subgroup. We suggest that stratifying each bar of the two histograms according to ordinal categories of the health condition or risk factor examined may provide useful details on the relationship between this condition or factor and key demographic variables like age and gender. In addition, the actual number of exposed subjects can be immediately read from the graph.

We therefore built a statistical routine with Stata to create modified population-pyramid plots separately for overweight/obese and current/former smoker. Data were derived from five National Health Interview Surveys carried out in Italy between 1983 and 2005. For each survey, data on age, gender, smoking status, height, and weight were extracted for subjects aged 20–99. Age and gender-specific prevalence rates of overweight/obese, and of current/former/never smoker were computed and applied to population estimates performed by the Italian national statistical institute (ISTAT). The resulting estimated numbers of underweight/normal weight/overweight/obese individuals and of former/current/never smokers were used to create the modified population pyramids.

In conclusion, modified population pyramids may contribute to assessing the impact of risk factors on a population in absolute terms, to evaluating how these risk factors are distributed by age and gender, and to assessing how the age and gender distribution of these risk factors changes over time.

Additional information

Ordered probit models with anchoring vignette

Claudio Rossetti
Università di Roma “Tor Vergata”
This paper presents a new Stata command for the estimation of ordered probit models with individual-specific thresholds, where anchoring vignettes are used to correct for differences in response scales. The analysis of ordered response data is very common in many research areas. Surveys in the social sciences very often have questions on individuals’ subjective evaluations of their own situation or what they think about a certain aspect of society. Nevertheless, when respondents use the ordinal response categories of standard survey questions in different ways, the validity of analyses based on the resulting data can be biased. Anchoring vignettes is a survey design technique that may be used to position self-reported responses on a common, interpersonally comparable scale. The model I present here is a parametric ordered probit model for the self-assessments, where the individual-specific thresholds depend on the same set of covariates as in the ordered probit model for the responses to the vignettes (King et al. 2004). Furthermore, I allow for the possibility of controlling for unobserved heterogeneity in response scales by including a random individual effect in the thresholds. The model is estimated by maximum likelihood. The new Stata command presented here takes advantage of the new technology available in Stata 10. Specifically, the maximization routine is written in Mata, the matrix programming language of Stata, and the new Mata function optimize() is employed to maximize the likelihood function; this results in very fast convergence. After a brief description of the ordered probit models with individual specific thresholds and anchoring vignettes, I describe the new Stata command for fitting such models and present an empirical application.

Reproducible research: Weaving with Stata

William Rising
Reproducible research is one of many names for the same concept: writing a single report document that contains both the report and the commands needed to produce the results and graphics contained in the report. It is called reproducible research because any interested researcher can then reproduce the entire report from the one document. (Programmers call this same concept “literate programming”.) The utility of reproducible research documents extends far beyond research or programming. They allow rapid updates should there be additional data. They can also be used in teaching for generating differing examples or test questions, because different parameters will generate different examples. In this presentation, I will show you how to use a third-party application to embed Stata code, as well as its output, in either LaTeX or OpenOffice documents. I will also use example documents (including the talk itself) to show how you can update a report, its results, and its graphics by using new data or changing parameters.

Additional information

Parametric and semiparametric estimation of ordered response models with sample selection and individual-specific thresholds

Giuseppe De Luca
Valeria Perotti
Claudio Rossetti
Università di Roma “Tor Vergata”
This paper provides a set of new Stata commands for parametric and semiparametric estimation of an extended version of ordered response models that accounts for both sample selection problems and heterogeneity in the thresholds for the latent variable. The standard estimator of ordered response models is therefore generalized along three directions. First, we account for the presence of endogenous selectivity effects that may lead to inconsistent estimates of the model parameters. Second, we control for both observed and unobserved heterogeneity in response scales by allowing the thresholds to depend on a set of covariates and a random individual effect. Finally, we consider two alternative specifications of the model, one parametric and one semiparametric. In the former, the error terms are assumed to follow a multivariate Gaussian distribution and the model parameters are estimated via maximum likelihood. In the latter, the distribution function of the error terms is instead approximated by following Gallant and Nychka (1997), and the model parameters are estimated via pseudo–maximum likelihood. After discussing identification and estimation issues, we present an empirical application using the second wave of the Survey on Health, Ageing and Retirement in Europe (SHARE). Specifically, we estimate an ordered response model for self-reported health on different domains by accounting for both sample selection bias due to survey nonresponse and reporting bias in the self-assessments of health.

Additional information

A simulation-based sensitivity analysis for matching estimators

Tommaso Nannicini
Universidad Carlos III de Madrid
In this paper, I present a Stata program (sensatt) that implements the sensitivity analysis for matching estimators proposed by Ichino, Mealli, and Nannicini (2008). The analysis simulates a potential confounder to assess the robustness of the estimated treatment effects with respect to deviations from the Conditional Independence Assumption (CIA). The program uses the commands for propensity-score matching (att*) developed by Becker and Ichino (2002). I provide an example using the National Supported Work (NSW) demonstration, widely known in the program evaluation literature.


Becker, S. O. and A. Ichino. (2002). Estimation of average treatment effects based on propensity scores. The Stata Journal 2: 358–377.

Ichino, A., F. Mealli, and T. Nannicini. (2008). From temporary help jobs to permanent employment: What can we learn from matching estimators and their sensitivity? Journal of Applied Econometrics 23: 305–327.

Additional information

Estimating multiway error-components models with correlated effects in Stata

Giovanni Bruno
Università Bocconi
I derive new estimators and tests of correlated effects for the (possibly) unbalanced multiway error component model (ECM), extending existing results in various aspects. The results by Kang (1985) on specification tests, who extended Hausman and Taylor (1981) and Mundlak (1978) to the two-way balanced model, emerge as particular cases of the present analysis. Davis (2002), who extends the analysis of Wansbeek and Kaptein (1989) to the multiway unbalanced model does not consider the cases of either correlated effects or specification tests. I also uncover new algebraic properties of the multiway ECM covariance matrix that prove useful for both computational and analytical purposes. Finally, I provide some examples using Stata.


Davis, P. (2002). Estimating multi-way error components models with unbalanced data structures. Journal of Econometrics 106: 67–95.

Hausman, J. A. and W. E. Taylor. (1981). Panel data and unobservable individual effects. Econometrica 49: 1377–1398.

Kang, S. (1985). A note on the equivalence of specification tests in the two-factor multivariate variance components model. Journal of Econometrics 28: 193–203.

Mundlak, Y. (1978). On the pooling of time series and cross section data. Econometrica 46: 69–85.

Wansbeek, T. and A. Kapteyn. (1989). Estimation of the error-components model with incomplete panels. Journal of Econometrics 41: 341–361.

Additional information

A seasonal root test with Stata

Domenico Depalo
Università di Roma “Tor Vergata”
Many economic time series exhibit important systematic fluctuations within the year, i.e., seasonality. Differently from usual practice, we argue that using original data should always be considered, although an unadjusted data process is more complicated than that of seasonally adjusted data. Motivations to use not-adjusted data come from the information contained in their peak and trough and from economic theory. One major complication is the unit root at seasonal frequencies. In this paper, we tackle this complication by implementing a test to identify the source of seasonality. In particular, we follow Hylleberg et al. (1993) for quarterly data. A practical example from Permanent Income Hypothesis emphasizes the utility of the command with macroeconomic time series.

Additional information

The use of Stata in biostat teaching

Rino Bellocco
Università di Milano–Bicocca and Karolinska Institutet
Stata is a software package that is currently widely used, and its utility is being recognized. This is leading to its increasing use worldwide in major departments of epidemiology and biostatistics, for both research and teaching purposes. The ability to use it at various levels of sophistication makes it an ideal package for introductory courses, where one is most likely to experience naïve users, as well as for researchers who tend to be more experienced and demanding in their requests for more esoteric calculations.

The purpose of this talk is to describe how teaching at basic and intermediate levels of biostatistics, especially in epidemiological courses, has been facilitated during the years through the use of Stata, both how the package has grown and how this has impacted what can reasonably be taught in these courses. That is not to say that there is not room for improvement; I will also discuss some potential areas for progress and expansion.

Additional information

Scientific organizers

Una-Louise Bell, TStat S.r.l.
[email protected]

Rino Bellocco, Karolinska Institutet
[email protected]

Giovanni Capelli, Università degli Studi di Cassino
[email protected]

Marcello Pagano, Harvard School of Public Health
[email protected]

Maurizio Pisati, Università degli Studi di Milano–Bicocca
[email protected]

Logistics organizers

TStat S.r.l, the official distributor of Stata in Italy.