Enzo Coviello

Azienda USL BA/1

In several medical reports, the survival function is graphed along with the
confidence intervals. The endpoints of the confidence intervals are usually
connected to draw an area where the entire survival curve is contained with
a given confidence.

Confidence intervals are pointwise, i.e., they refer to the survival probability at a single time, but they are not valid for all the estimates of the entire survival curve. To this aim, the appropriate measure is confidence bands, not yet available within Stata.

Two methods are usually employed to construct these confidence bands. The first was proposed by Hall and Wellner (1980), and the second was proposed by Nair (1984). The latter produces the so-called equal precision (EP) confidence bands. For both methods, log-minus-log and arcsine square-root transformed versions have been proposed.

**stcband** is a new Stata command that allows the user to graph the
survival function, together with the confidence bands constructed according
to the Hall–Wellner and EP methodologies. The available options allow
the user to a) specify the lower and upper limits of the time where the
bands are to be estimated; b) choose the linear, log, or arcsine
transformation; c) set the confidence level at 90, 95, or 99%; d) save the
estimates; and e) manage the aspect of the graph. A further option allows
the user to estimate the confidence bands of the cumulative hazard function.

Using an example, I illustrate**bootstrap** command.

The new command also has an accompanying help file in which the user is able to run an example, taken from the second edition of Klein and Moeschberger’s*Survival Analysis
Techniques for Censored and Truncated Data* (p. 109–117), by
clicking on the Viewer window. **stcband** is available for download from
the SSC Archives.

I am grateful to Maarten Buis for his advice in constructing the simulations comparing the coverage probabilities of the confidence bands.

**Additional information**

coviello_2008.pdf

Confidence intervals are pointwise, i.e., they refer to the survival probability at a single time, but they are not valid for all the estimates of the entire survival curve. To this aim, the appropriate measure is confidence bands, not yet available within Stata.

Two methods are usually employed to construct these confidence bands. The first was proposed by Hall and Wellner (1980), and the second was proposed by Nair (1984). The latter produces the so-called equal precision (EP) confidence bands. For both methods, log-minus-log and arcsine square-root transformed versions have been proposed.

Using an example, I illustrate

- the results obtained by using
**stcband**and the corresponding R function; - the use of the command and the differences of the estimates of its confidence bands versus the usual pointwise confidence intervals.

The new command also has an accompanying help file in which the user is able to run an example, taken from the second edition of Klein and Moeschberger’s

I am grateful to Maarten Buis for his advice in constructing the simulations comparing the coverage probabilities of the confidence bands.

coviello_2008.pdf

Giovanni Capelli

Università di Cassino

Bruno Federico

Università di Cassino

Giuseppe Costa

Università di Torino

Tobacco smoking and obesity greatly contribute to premature death and
disease in developed countries. In order to measure the extent to which these
risk factors affect a population, as well as to describe the progression of
these epidemics over time, routine surveillance of the prevalence of obesity
and smoking is carried out by international organizations, national
departments of health, and statistical offices. To this end, summary
measures—like age-standardized rates, and tabular and graphical
representations, such as maps—are used.

In this study, we argue that population pyramids, a widely used demographic tool, may be easily adapted to provide relevant visual information for public health purposes. By means of two juxtaposed histograms, one for each gender, population pyramids show either the proportion or the actual number of subjects in each age and gender subgroup. We suggest that stratifying each bar of the two histograms according to ordinal categories of the health condition or risk factor examined may provide useful details on the relationship between this condition or factor and key demographic variables like age and gender. In addition, the actual number of exposed subjects can be immediately read from the graph.

We therefore built a statistical routine with Stata to create modified population-pyramid plots separately for overweight/obese and current/former smoker. Data were derived from five National Health Interview Surveys carried out in Italy between 1983 and 2005. For each survey, data on age, gender, smoking status, height, and weight were extracted for subjects aged 20–99. Age and gender-specific prevalence rates of overweight/obese, and of current/former/never smoker were computed and applied to population estimates performed by the Italian national statistical institute (ISTAT). The resulting estimated numbers of underweight/normal weight/overweight/obese individuals and of former/current/never smokers were used to create the modified population pyramids.

In conclusion, modified population pyramids may contribute to assessing the impact of risk factors on a population in absolute terms, to evaluating how these risk factors are distributed by age and gender, and to assessing how the age and gender distribution of these risk factors changes over time.

**Additional information**

capelli_2008.pdf

In this study, we argue that population pyramids, a widely used demographic tool, may be easily adapted to provide relevant visual information for public health purposes. By means of two juxtaposed histograms, one for each gender, population pyramids show either the proportion or the actual number of subjects in each age and gender subgroup. We suggest that stratifying each bar of the two histograms according to ordinal categories of the health condition or risk factor examined may provide useful details on the relationship between this condition or factor and key demographic variables like age and gender. In addition, the actual number of exposed subjects can be immediately read from the graph.

We therefore built a statistical routine with Stata to create modified population-pyramid plots separately for overweight/obese and current/former smoker. Data were derived from five National Health Interview Surveys carried out in Italy between 1983 and 2005. For each survey, data on age, gender, smoking status, height, and weight were extracted for subjects aged 20–99. Age and gender-specific prevalence rates of overweight/obese, and of current/former/never smoker were computed and applied to population estimates performed by the Italian national statistical institute (ISTAT). The resulting estimated numbers of underweight/normal weight/overweight/obese individuals and of former/current/never smokers were used to create the modified population pyramids.

In conclusion, modified population pyramids may contribute to assessing the impact of risk factors on a population in absolute terms, to evaluating how these risk factors are distributed by age and gender, and to assessing how the age and gender distribution of these risk factors changes over time.

capelli_2008.pdf

Claudio Rossetti

Università di Roma “Tor Vergata”

This paper presents a new Stata command for the estimation of ordered probit
models with individual-specific thresholds, where anchoring vignettes are used
to correct for differences in response scales. The analysis of ordered
response data is very common in many research areas. Surveys in the social
sciences very often have questions on individuals’ subjective evaluations of
their own situation or what they think about a certain aspect of society.
Nevertheless, when respondents use the ordinal response categories of
standard survey questions in different ways, the validity of analyses based
on the resulting data can be biased. Anchoring vignettes is a survey design
technique that may be used to position self-reported responses on a common,
interpersonally comparable scale. The model I present here is a parametric
ordered probit model for the self-assessments, where the individual-specific
thresholds depend on the same set of covariates as in the ordered probit
model for the responses to the vignettes (King et al. 2004). Furthermore,
I allow for the possibility of controlling for unobserved heterogeneity in
response scales by including a random individual effect in the thresholds.
The model is estimated by maximum likelihood. The new Stata command
presented here takes advantage of the new technology available in Stata 10.
Specifically, the maximization routine is written in Mata, the matrix
programming language of Stata, and the new Mata function **optimize()** is
employed to maximize the likelihood function; this results in very fast
convergence. After a brief description of the ordered probit models with
individual specific thresholds and anchoring vignettes, I describe the new
Stata command for fitting such models and present an empirical application.

William Rising

StataCorp

Reproducible research is one of many names for the same concept: writing a
single report document that contains both the report and the commands
needed to produce the results and graphics contained in the report. It is
called reproducible research because any interested researcher can then
reproduce the entire report from the one document. (Programmers call this
same concept “literate programming”.) The utility of
reproducible research documents extends far beyond research or programming.
They allow rapid updates should there be additional data. They can
also be used in teaching for generating differing examples or test
questions, because different parameters will generate different examples. In
this presentation, I will show you how to use a third-party application to embed
Stata code, as well as its output, in either LaTeX or OpenOffice documents.
I will also use example documents (including the talk itself) to show how
you can update a report, its results, and its graphics by using new data or
changing parameters.

**Additional information**

rising_2008.pdf

ooexample.zip

rising_2008.pdf

ooexample.zip

Giuseppe De Luca

ISFOL

Valeria Perotti

ISFOL

Claudio Rossetti

Università di Roma “Tor Vergata”

This paper provides a set of new Stata commands for parametric and
semiparametric estimation of an extended version of ordered response
models that accounts for both sample selection problems and heterogeneity
in the thresholds for the latent variable. The standard estimator of ordered
response models is therefore generalized along three directions. First, we
account for the presence of endogenous selectivity effects that may lead to
inconsistent estimates of the model parameters. Second, we control for
both observed and unobserved heterogeneity in response scales by allowing
the thresholds to depend on a set of covariates and a random individual
effect. Finally, we consider two alternative specifications of the model, one
parametric and one semiparametric. In the former, the error terms are
assumed to follow a multivariate Gaussian distribution and the model
parameters are estimated via maximum likelihood. In the latter, the
distribution function of the error terms is instead approximated by following
Gallant and Nychka (1997), and the model parameters are estimated via
pseudo–maximum likelihood. After discussing identification and estimation
issues, we present an empirical application using the second wave of the
Survey on Health, Ageing and Retirement in Europe (SHARE). Specifically,
we estimate an ordered response model for self-reported health on different
domains by accounting for both sample selection bias due to survey
nonresponse and reporting bias in the self-assessments of health.

**Additional information**

de_luca_2008.pdf

de_luca_2008.pdf

Tommaso Nannicini

Universidad Carlos III de Madrid

In this paper, I present a Stata program (**sensatt**) that implements the
sensitivity analysis for matching estimators proposed by Ichino, Mealli, and
Nannicini (2008). The analysis simulates a potential confounder
to assess the robustness of the estimated treatment effects with respect to
deviations from the Conditional Independence Assumption (CIA). The program
uses the commands for propensity-score matching (**att***)
developed by Becker and Ichino (2002). I provide an example using the
National Supported Work (NSW) demonstration, widely known in the program
evaluation literature.

**References**

Becker, S. O. and A. Ichino. (2002). Estimation of average treatment effects based on propensity scores.*The
Stata Journal* 2: 358–377.

Ichino, A., F. Mealli, and T. Nannicini. (2008). From temporary help jobs to permanent employment: What can we learn from matching estimators and their sensitivity?*Journal of Applied Econometrics* 23: 305–327.

**Additional information**

nannicini_2008.pdf

Becker, S. O. and A. Ichino. (2002). Estimation of average treatment effects based on propensity scores.

Ichino, A., F. Mealli, and T. Nannicini. (2008). From temporary help jobs to permanent employment: What can we learn from matching estimators and their sensitivity?

nannicini_2008.pdf

Giovanni Bruno

Università Bocconi

I derive new estimators and tests of correlated effects for the (possibly)
unbalanced multiway error component model (ECM), extending existing results
in various aspects. The results by Kang (1985) on specification tests, who
extended Hausman and Taylor (1981) and Mundlak (1978) to the two-way
balanced model, emerge as particular cases of the present analysis. Davis
(2002), who extends the analysis of Wansbeek and Kaptein (1989) to the
multiway unbalanced model does not consider the cases of either correlated
effects or specification tests. I also uncover new algebraic properties of
the multiway ECM covariance matrix that prove useful for both computational
and analytical purposes. Finally, I provide some examples using Stata.

**References**

Davis, P. (2002). Estimating multi-way error components models with unbalanced data structures.* Journal of Econometrics* 106:
67–95.

Hausman, J. A. and W. E. Taylor. (1981). Panel data and unobservable individual effects.*Econometrica* 49: 1377–1398.

Kang, S. (1985). A note on the equivalence of specification tests in the two-factor multivariate variance components model.*Journal of
Econometrics* 28: 193–203.

Mundlak, Y. (1978). On the pooling of time series and cross section data.*Econometrica* 46: 69–85.

Wansbeek, T. and A. Kapteyn. (1989). Estimation of the error-components model with incomplete panels.*Journal of Econometrics* 41: 341–361.

**Additional information**

bruno_2008.pdf

Davis, P. (2002). Estimating multi-way error components models with unbalanced data structures.

Hausman, J. A. and W. E. Taylor. (1981). Panel data and unobservable individual effects.

Kang, S. (1985). A note on the equivalence of specification tests in the two-factor multivariate variance components model.

Mundlak, Y. (1978). On the pooling of time series and cross section data.

Wansbeek, T. and A. Kapteyn. (1989). Estimation of the error-components model with incomplete panels.

bruno_2008.pdf

Domenico Depalo

Università di Roma “Tor Vergata”

Many economic time series exhibit important systematic fluctuations within
the year, i.e., seasonality. Differently from usual practice, we argue that
using original data should always be considered, although an unadjusted data
process is more complicated than that of seasonally adjusted data.
Motivations to use not-adjusted data come from the information contained in
their peak and trough and from economic theory. One major complication is
the unit root at seasonal frequencies. In this paper, we tackle this
complication by implementing a test to identify the source of seasonality.
In particular, we follow Hylleberg et al. (1993) for quarterly data. A
practical example from Permanent Income Hypothesis emphasizes the utility of
the command with macroeconomic time series.

**Additional information**

depalo_2008.pdf

depalo_2008.pdf

Rino Bellocco

Università di Milano–Bicocca and Karolinska Institutet

Stata is a software package that is currently widely used, and its utility
is being recognized. This is leading to its increasing use worldwide in
major departments of epidemiology and biostatistics, for both research and
teaching purposes. The ability to use it at various levels of sophistication
makes it an ideal package for introductory courses, where one is most likely
to experience naïve users, as well as for researchers who tend to be more
experienced and demanding in their requests for more esoteric calculations.

The purpose of this talk is to describe how teaching at basic and intermediate levels of biostatistics, especially in epidemiological courses, has been facilitated during the years through the use of Stata, both how the package has grown and how this has impacted what can reasonably be taught in these courses. That is not to say that there is not room for improvement; I will also discuss some potential areas for progress and expansion.

**Additional information**

bellocco_2008.pdf

The purpose of this talk is to describe how teaching at basic and intermediate levels of biostatistics, especially in epidemiological courses, has been facilitated during the years through the use of Stata, both how the package has grown and how this has impacted what can reasonably be taught in these courses. That is not to say that there is not room for improvement; I will also discuss some potential areas for progress and expansion.

bellocco_2008.pdf