2008 Italian Stata Users Group meeting: Abstracts
Monday, October 20
Confidence bands for the survival function
Azienda USL BA/1
In several medical reports, the survival function is graphed along with the
confidence intervals. The endpoints of the confidence intervals are usually
connected to draw an area where the entire survival curve is contained with
a given confidence.
Confidence intervals are pointwise, i.e., they refer to the survival
probability at a single time, but they are not valid for all the estimates
of the entire survival curve. To this aim, the appropriate measure is
confidence bands, not yet available within Stata.
Two methods are usually employed to construct these confidence bands. The
first was proposed by Hall and Wellner (1980), and the second was proposed
by Nair (1984). The latter produces the so-called equal precision (EP)
confidence bands. For both methods, log-minus-log and arcsine square-root
transformed versions have been proposed.
is a new Stata command that allows the user to graph the
survival function, together with the confidence bands constructed according
to the Hall–Wellner and EP methodologies. The available options allow
the user to a) specify the lower and upper limits of the time where the
bands are to be estimated; b) choose the linear, log, or arcsine
transformation; c) set the confidence level at 90, 95, or 99%; d) save the
estimates; and e) manage the aspect of the graph. A further option allows
the user to estimate the confidence bands of the cumulative hazard function.
Using an example, I illustrate
- the results obtained by using stcband and the corresponding R
- the use of the command and the differences of the estimates of its
confidence bands versus the usual pointwise confidence intervals.
Finally, I will compare the coverage probabilities of the confidence bands
estimated according to the above mentioned approaches by using simulated
data with various survival distributions. I will perform simulations using
The new command also has an accompanying help file in which the user is able
to run an example, taken from the second edition of Klein and
Moeschberger’s Survival Analysis
Techniques for Censored and Truncated Data
(p. 109–117), by
clicking on the Viewer window. stcband
is available for download from
the SSC Archives.
I am grateful to Maarten Buis for his advice in constructing the
simulations comparing the coverage probabilities of the confidence bands.
New wine in new bottles: Visualizing the progression over
time of the epidemics of tobacco smoking and obesity through the use of
modified population pyramids
Università di Cassino
Università di Cassino
Università di Torino
Tobacco smoking and obesity greatly contribute to premature death and
disease in developed countries. In order to measure the extent to which these
risk factors affect a population, as well as to describe the progression of
these epidemics over time, routine surveillance of the prevalence of obesity
and smoking is carried out by international organizations, national
departments of health, and statistical offices. To this end, summary
measures—like age-standardized rates, and tabular and graphical
representations, such as maps—are used.
In this study, we argue that population pyramids, a widely used demographic
tool, may be easily adapted to provide relevant visual information for
public health purposes. By means of two juxtaposed histograms, one for each
gender, population pyramids show either the proportion or the actual number
of subjects in each age and gender subgroup. We suggest that stratifying
each bar of the two histograms according to ordinal categories of the health
condition or risk factor examined may provide useful details on the
relationship between this condition or factor and key demographic variables
like age and gender. In addition, the actual number of exposed subjects can
be immediately read from the graph.
We therefore built a statistical routine with Stata to create modified
population-pyramid plots separately for overweight/obese and
current/former smoker. Data were derived from five National Health Interview
Surveys carried out in Italy between 1983 and 2005. For each survey, data on
age, gender, smoking status, height, and weight were extracted for subjects
aged 20–99. Age and gender-specific prevalence rates of overweight/obese,
and of current/former/never smoker were computed and applied to population
estimates performed by the Italian national statistical institute (ISTAT). The
resulting estimated numbers of underweight/normal weight/overweight/obese
individuals and of former/current/never smokers were used to create the
modified population pyramids.
In conclusion, modified population pyramids may contribute to assessing the
impact of risk factors on a population in absolute terms, to evaluating how
these risk factors are distributed by age and gender, and to assessing
how the age and gender distribution of these risk factors changes over time.
Ordered probit models with anchoring vignette
Università di Roma “Tor Vergata”
This paper presents a new Stata command for the estimation of ordered probit
models with individual-specific thresholds, where anchoring vignettes are used
to correct for differences in response scales. The analysis of ordered
response data is very common in many research areas. Surveys in the social
sciences very often have questions on individuals’ subjective evaluations of
their own situation or what they think about a certain aspect of society.
Nevertheless, when respondents use the ordinal response categories of
standard survey questions in different ways, the validity of analyses based
on the resulting data can be biased. Anchoring vignettes is a survey design
technique that may be used to position self-reported responses on a common,
interpersonally comparable scale. The model I present here is a parametric
ordered probit model for the self-assessments, where the individual-specific
thresholds depend on the same set of covariates as in the ordered probit
model for the responses to the vignettes (King et al. 2004). Furthermore,
I allow for the possibility of controlling for unobserved heterogeneity in
response scales by including a random individual effect in the thresholds.
The model is estimated by maximum likelihood. The new Stata command
presented here takes advantage of the new technology available in Stata 10.
Specifically, the maximization routine is written in Mata, the matrix
programming language of Stata, and the new Mata function optimize() is
employed to maximize the likelihood function; this results in very fast
convergence. After a brief description of the ordered probit models with
individual specific thresholds and anchoring vignettes, I describe the new
Stata command for fitting such models and present an empirical application.
Reproducible research: Weaving with Stata
Reproducible research is one of many names for the same concept: writing a
single report document that contains both the report and the commands
needed to produce the results and graphics contained in the report. It is
called reproducible research because any interested researcher can then
reproduce the entire report from the one document. (Programmers call this
same concept “literate programming”.) The utility of
reproducible research documents extends far beyond research or programming.
They allow rapid updates should there be additional data. They can
also be used in teaching for generating differing examples or test
questions, because different parameters will generate different examples. In
this presentation, I will show you how to use a third-party application to embed
Stata code, as well as its output, in either LaTeX or OpenOffice documents.
I will also use example documents (including the talk itself) to show how
you can update a report, its results, and its graphics by using new data or
Parametric and semiparametric estimation of ordered
response models with sample selection and individual-specific thresholds
Giuseppe De Luca
Università di Roma “Tor Vergata”
This paper provides a set of new Stata commands for parametric and
semiparametric estimation of an extended version of ordered response
models that accounts for both sample selection problems and heterogeneity
in the thresholds for the latent variable. The standard estimator of ordered
response models is therefore generalized along three directions. First, we
account for the presence of endogenous selectivity effects that may lead to
inconsistent estimates of the model parameters. Second, we control for
both observed and unobserved heterogeneity in response scales by allowing
the thresholds to depend on a set of covariates and a random individual
effect. Finally, we consider two alternative specifications of the model, one
parametric and one semiparametric. In the former, the error terms are
assumed to follow a multivariate Gaussian distribution and the model
parameters are estimated via maximum likelihood. In the latter, the
distribution function of the error terms is instead approximated by following
Gallant and Nychka (1997), and the model parameters are estimated via
pseudo–maximum likelihood. After discussing identification and estimation
issues, we present an empirical application using the second wave of the
Survey on Health, Ageing and Retirement in Europe (SHARE). Specifically,
we estimate an ordered response model for self-reported health on different
domains by accounting for both sample selection bias due to survey
nonresponse and reporting bias in the self-assessments of health.
A simulation-based sensitivity analysis for matching
Universidad Carlos III de Madrid
In this paper, I present a Stata program (sensatt
) that implements the
sensitivity analysis for matching estimators proposed by Ichino, Mealli, and
Nannicini (2008). The analysis simulates a potential confounder
to assess the robustness of the estimated treatment effects with respect to
deviations from the Conditional Independence Assumption (CIA). The program
uses the commands for propensity-score matching (att*
developed by Becker and Ichino (2002). I provide an example using the
National Supported Work (NSW) demonstration, widely known in the program
Becker, S. O. and A. Ichino. (2002). Estimation of average treatment effects
based on propensity scores.
Stata Journal 2: 358–377
Ichino, A., F. Mealli, and T. Nannicini. (2008). From temporary help jobs to
permanent employment: What can we learn from matching estimators and their
sensitivity? Journal of Applied Econometrics
Estimating multiway error-components models with correlated
effects in Stata
I derive new estimators and tests of correlated effects for the (possibly)
unbalanced multiway error component model (ECM), extending existing results
in various aspects. The results by Kang (1985) on specification tests, who
extended Hausman and Taylor (1981) and Mundlak (1978) to the two-way
balanced model, emerge as particular cases of the present analysis. Davis
(2002), who extends the analysis of Wansbeek and Kaptein (1989) to the
multiway unbalanced model does not consider the cases of either correlated
effects or specification tests. I also uncover new algebraic properties of
the multiway ECM covariance matrix that prove useful for both computational
and analytical purposes. Finally, I provide some examples using Stata.
Davis, P. (2002). Estimating multi-way error components models with
unbalanced data structures. Journal of Econometrics
Hausman, J. A. and W. E. Taylor. (1981). Panel data and unobservable
individual effects. Econometrica
Kang, S. (1985). A note on the equivalence of specification tests in the
two-factor multivariate variance components model. Journal of
Mundlak, Y. (1978). On the pooling of time series and cross section data.
Wansbeek, T. and A. Kapteyn. (1989). Estimation of the error-components model
with incomplete panels. Journal of Econometrics
A seasonal root test with Stata
Università di Roma “Tor Vergata”
Many economic time series exhibit important systematic fluctuations within
the year, i.e., seasonality. Differently from usual practice, we argue that
using original data should always be considered, although an unadjusted data
process is more complicated than that of seasonally adjusted data.
Motivations to use not-adjusted data come from the information contained in
their peak and trough and from economic theory. One major complication is
the unit root at seasonal frequencies. In this paper, we tackle this
complication by implementing a test to identify the source of seasonality.
In particular, we follow Hylleberg et al. (1993) for quarterly data. A
practical example from Permanent Income Hypothesis emphasizes the utility of
the command with macroeconomic time series.
The use of Stata in biostat teaching
Università di Milano–Bicocca and Karolinska Institutet
Stata is a software package that is currently widely used, and its utility
is being recognized. This is leading to its increasing use worldwide in
major departments of epidemiology and biostatistics, for both research and
teaching purposes. The ability to use it at various levels of sophistication
makes it an ideal package for introductory courses, where one is most likely
to experience naïve users, as well as for researchers who tend to be more
experienced and demanding in their requests for more esoteric calculations.
The purpose of this talk is to describe how teaching at basic and
intermediate levels of biostatistics, especially in epidemiological courses,
has been facilitated during the years through the use of Stata, both how the
package has grown and how this has impacted what can reasonably be taught
in these courses. That is not to say that there is not room for improvement;
I will also discuss some potential areas for progress and expansion.