Alexander Zlotnik

Hospital Universitario Ramón y Cajal, Unidad de Bioestadística, IRYCIS, Universidad Politécnica de Madrid, Dpto. Ingeniería Electrónica

Víctor Abraira

Hospital Universitario Ramón y Cajal, Unidad de Bioestadística, IRYCIS, CIBERESP

We have developed the **nomolog** program for the generation of
logistic regression nomograms in Stata. It has been recently
accepted for publication in the *Stata Journal* and will soon be
published on the SSC repository. Some of the challenges we
encountered during its development were i) inclusion of main
effects and interaction factors, ii) continuous # continuous
interactions, and iii) development of an automated testing environment.
We present the solutions to these, the most relevant implementation
details, and the development practices, which may benefit persons
interested in building their own programs based on Stata.

During the development of this program, we were independently
contacted by several researchers interested in the generation
of Cox regression nomograms with Stata. We discuss the differences
and similarities between logistic and Cox regression nomograms as
well as the limitations and expected capabilities of a modification of
**nomolog** that will introduce this feature.

**Additional information**

es14_zlotnik_abraira.pdf

Further explanations, examples and download links for nomogram generators for logistic and Cox regressions are available at: www.zlotnik.net/stata/nomograms

Enrique Pinzón

StataCorp

The **margins** command in Stata allows us to get a wide array of
results using coefficient estimates. I will illustrate the use
of margins in some commonly used models. I will then illustrate
a new result. I will show how we can use **margins** to obtain,
after fixed-effects panel-data estimation, average marginal effects
and average treatment effects that incorporate the effect of the unobserved
time-invariant component.

**Additional information**

es14_pinzon.pdf

es14_pinzon.pdf

Modesto Escobar

Universidad de Salamanca, Dpto. de Sociología y Comunicación

The aim of this talk is to introduce a new framework to study
data structures that is based on a combination of statistical
and social network analysis and is called coincidence
analysis. The purpose of this procedure is to ascertain the most
frequent events in a given set of scenarios and to study the
relationships between them. In accordance with this procedure,
the concurrence of persons, objects, attributes, characteristics,
or events within the same temporally or spatially limited set can
be classified in the following manner:

a) simple, if both occur at least once in the same set;

b) likely, where the level of concurrence must be more than a single coincidence and more probable than a concurrence produced by mere chance; and

c) statistically probable, that is, in cases where samples of events are the subject of analysis, a confidence interval should be established to determine the statistical meaning of the combination of events.

a) simple, if both occur at least once in the same set;

b) likely, where the level of concurrence must be more than a single coincidence and more probable than a concurrence produced by mere chance; and

c) statistically probable, that is, in cases where samples of events are the subject of analysis, a confidence interval should be established to determine the statistical meaning of the combination of events.

This mode of analysis can be applied to the exploratory analysis of questionnaires, the study of textual networks, the review of the content of databases, and the comparison of different statistical analysis of interdependence because the following techniques can be used with the same data: multidimensional scaling, principal component analysis, correspondence analysis, biplot representations, agglomeration techniques, and network analysis algorithms.

The statistical bases of this analysis are described, as is
the program written in Stata (coin) that allows the analysis
to be executed. As an example of its use, the photograph albums
of the following people who were famous in the early twentieth
century are described: Miguel de Unamuno (1864–1936), Rafael
Masó (1880–1935), Joaquín Turina (1882–1949), and
Antonia Mercé (1890–1936), stage name la Argentina.

**Additional information**

es14_escobar.pdf

Elisa Sicuri, Sergio Alonso

CRESIB

Malaria is one of the leading causes of death in Sub-Saharan
Africa. Artemisinin-combination therapies (ACTs) are used as
first-line drugs for treatment, but their market is far from
competitive. Important supply issues include limited availability
and low quality, while on the demand side, market failures are
more related to the lack of information and accessibility to
the treatment.

To estimate the actual willingness-to-pay (WTP) for ACTs among children with malaria in rural Mozambique, researchers conducted a survey among patients at a district hospital. Data collected through the survey were merged with demographic surveillance data and the hospital passive case detection systems in place in the area. A negative binomial (NB) regression was used to identify the determinants of the demand for ACTs.

Results showed that WTP is negatively associated with the
number of malaria episodes the child has previously suffered
during the same malaria season and with the socio-economic
position. Age and occupation of the family head were also
positively correlated with the WTP. This study also discussed
the appropriateness of using contingent valuation methods for
estimating WTP. Respondents stated a higher willing-to-pay than
expected, but they revealed a much realistic demand price when
asked for ability-to-pay. These results provide evidence that ACT
subsidies to the private sector are needed to improve access to
malaria treatment in rural Mozambique.

**Additional information**

es14_alonso.pdf

Cristian Tebé

Agencia de Calidad y Evaluación Sanitaria de Cataluña (AQuAS)

Variations in medical practice are defined as systematic variations
(not due to chance) of adjusted rates of clinical procedures for a
given level of aggregation of the population. The aim of this talk
is to explore and describe variations in different clinical conditions
and surgical procedures from a population perspective to offer a richer
perspective for the assessment of health services in a complex public
health care environment. The basic strategy of analysis is to make
comparisons among rates of activity (numerator: hospital admissions)
of inhabitants of a territory (denominator: basic health area).
Results are presented in tables of standardized rates (using **dstdize**)
and ratios of activity using small-area analysis (calling R from
Stata). Most results are presented in maps (using **spmap**) for better
visualization. Variation analysis can be a good monitoring tool for
any health system. Published atlases have received attention from both
clinical and healthcare audiences.

Xavier Bosch-Capblanc

Centro Suizo para la Salud Internacional, Salud Pública

Data requirements by governments, donors, and the international
community to measure health and development achievements have
increased in the last decade. Datasets produced in surveys
conducted in several countries and years are often combined to
analyze time trends and geographical patterns of demographic
and health-related indicators. However, because not all datasets
have the same structure, variable definitions, and codes, they
have to be harmonized prior to submitting them to statistical
analyses. Manually searching, renaming, and recoding variables are
extremely tedious and prone to errors when the
number of datasets and variables are large. This article presents
an automated approach to harmonizing variable names across several
datasets, which optimizes the search of variables, minimizes manual
inputs, and reduces the risk of error.

Results:

Three consecutive algorithms are applied iteratively to search
for each variable of interest for the analyses in all datasets.
The first search (A) captures particular cases that could not be
solved in an automated way in the search iterations; the second
search (B) is run if search A produced no hits and identifies
variables of which the labels contain certain key terms defined
by the user. If this search produces no hits, a third one (C)
is run to retrieve variables that have been identified in other
surveys. For each variable of interest, the
outputs of these engines can be the following: 1, a single best matching
variable is found; 2, more than one matching variable is found;
or 3, no matching variables are found. Output 2 is solved by
user judgement. Examples using 4 variables are presented and show
that the searches have a 100% sensitivity and specificity after a
second iteration.

**Additional information**

es14_bosch.pdf

Isabel Cañette

StataCorp

A lot has been said about presenting and interpreting results
from binary models. Policy makers are usually interested in
population effects, while health providers are mostly interested
in individual predicted effects.

This presentation has two aims. First, I will discuss
different measures of interest for these kinds of models, such
as probabilities, odds ratios, risk ratios,and marginal effects,
and how they relate to each other. Second, I will
show different ways to use Stata resources to interpret and
present results from regression models in general. These
approaches can be useful also in the teaching environment.

**Additional information**

es14_canette.pdf

(Package of commands for statistics and epidemiology users)

Josep M. Domenech-Massons, Roberto Sesma-Morales

Universidad Autónoma de Barcelona, Laboratorio de Estadística

A lo largo de varias décadas los estudios de postgrado en “Diseño y Estadística en Ciencias de la Salud” han impartido la docencia con SPSS Statistics, lo que comportó programar una serie de Macros y Scripts que implementaban los análisis necesarios para docencia e investigación no disponibles en dicho paquete.

Recientemente hemos finalizado la reconversión de todos los cursos a Stata y transformado los macros y scripts SPSS que realizan procedimientos no disponibles en Stata en comandos de usuario con sus correspondientes cuadros de diálogo y versiones inmediatas.

**Additional information**

es14_domenech.pdf

Alexander Zlotnik, Juan Manuel Montero

Universidad Politécnica de Madrid, Dpto. Ingeniería Electrónica

Ascensión Gallardo-Antolín

Universidad Carlos III de Madrid. Dpto. Teoría de la Señal y Comunicaciones

Statistical software packages are frequently developed in
general-purpose programming languages (such as Java, C, and C++)
that do not include statistical operations in their core
libraries. Software developers are therefore forced to create
their own statistical subroutines, use third-party libraries, or
follow a hybrid approach. This produces a fairly rich variety
of implementations even for the simplest operations, such as the
estimation of percentiles. In most cases, there is no gold
standard and different approaches, which may yield different
results with identical inputs, are acceptable. These differences
are often not obvious and usually not documented, and references
to alternative approaches are most often omitted. Although this
is widely known by people with some experience in statistical
software development, most users of statistical software ignore
these subtle differences and may spend considerable
time comparing results from seemingly identical operations in
different software packages. This may become especially daunting
when this comparison is made between custom-developed software
and an industry-standard statistical software package, such as Stata.

In this presentation, we explain a formal methodology for the
comparison of final and partial results of statistical operations
between Stata and other software packages. As a case study, we
discuss the differences between the calculation of logistic
regression coefficients, Hosmer–Lemeshow “deciles of risk”, and
null hypothesis testing for the comparison between observed an
expected deciles performed with Stata and a custom-developed Java program.

**Additional information**

es14_zlotnik_montero.pdf

Marcela Marinelli, Cristian Tebé

Agencia de Calidad y Evaluación Sanitaria de Cataluña (AQuAS)

The Catalan arthroplasty register (RACat) produces annual
clinical reports per center (52 hospitals). These reports
are typically generated manually by some analysts.
Stata and LaTeX integration permits the automation of such reports.
LaTeX can directly execute a Stata do-file that uses different
commands. The aim of the present study was to produce automatic
reports in an integrated STATA–LaTeX system using the **foreach**
command to generate the structure of the 52 hospital reports;
the **listtex** and **tabout** commands to produce tables of
the hospital characteristics
of the operated patients, types of surgical procedures, and
prostheses survival at 1, 3, and 5 years after primary surgery;
and the **graph2tex** command to generate a LaTeX graph code to
be included in a LaTeX file.

Hospital risks of revision of the implant following knee and
hip arthroplasty were measured considering Fine and Gray’s
model and using stcrreg (death as competing event) and were
compared using funnel plot graphs. Stata–LaTeX integration
permits a dynamic do-file and saves a lot of time
when changes in the analysis are necessary.

**Additional information**

es14_marinelli.pptx

Santiago Pérez-Hoyos

Instituto de Investigación Vall d’Hebrón, Unidad de Bioestadística y Bioinformática

In survival analysis involving data from clinical or epidemiological
studies, increasing interest is given to transforming a continous
variable into a categorical one, usually binary. The main objective
of this transformation is to build a predictive score of a follow-up
event. We present a combination of stata and adhoc
functions based on profile likelihood comparisons. Results are
presented in html format, including a top-ten cutpoint, an optimal
cutpoint, a Kaplan–Meier estimation in graphical and list output, a
likelihood and Hazar ratio profile, and a Cox regression
model. Results are compared with those obtained by R library maxstat
in real data examples. Changing some initial parameters, users can extend
the process to other regression models.

**Additional information**

es14_perez.pdf

es14_perez.pdf