2014 Spanish Stata Users Group meeting

23 October 2014

Barcelona Park

Facultad de Medicina
Universitat de Barcelona
Barcelona, Spain


Development of the nomolog program and its evolution: Toward the implementation of a nomogram generator for the Cox regression

Alexander Zlotnik
Hospital Universitario Ramón y Cajal, Unidad de Bioestadística, IRYCIS, Universidad Politécnica de Madrid, Dpto. Ingeniería Electrónica
Víctor Abraira
Hospital Universitario Ramón y Cajal, Unidad de Bioestadística, IRYCIS, CIBERESP
We have developed the nomolog program for the generation of logistic regression nomograms in Stata. It has been recently accepted for publication in the Stata Journal and will soon be published on the SSC repository. Some of the challenges we encountered during its development were i) inclusion of main effects and interaction factors, ii) continuous # continuous interactions, and iii) development of an automated testing environment. We present the solutions to these, the most relevant implementation details, and the development practices, which may benefit persons interested in building their own programs based on Stata.

During the development of this program, we were independently contacted by several researchers interested in the generation of Cox regression nomograms with Stata. We discuss the differences and similarities between logistic and Cox regression nomograms as well as the limitations and expected capabilities of a modification of nomolog that will introduce this feature.

Additional information

Further explanations, examples and download links for nomogram generators for logistic and Cox regressions are available at: www.zlotnik.net/stata/nomograms

Margins reloaded

Enrique Pinzón
The margins command in Stata allows us to get a wide array of results using coefficient estimates. I will illustrate the use of margins in some commonly used models. I will then illustrate a new result. I will show how we can use margins to obtain, after fixed-effects panel-data estimation, average marginal effects and average treatment effects that incorporate the effect of the unobserved time-invariant component.

Additional information

Studying coincidences with network analysis and other statistical tools

Modesto Escobar
Universidad de Salamanca, Dpto. de Sociología y Comunicación
The aim of this talk is to introduce a new framework to study data structures that is based on a combination of statistical and social network analysis and is called coincidence analysis. The purpose of this procedure is to ascertain the most frequent events in a given set of scenarios and to study the relationships between them. In accordance with this procedure, the concurrence of persons, objects, attributes, characteristics, or events within the same temporally or spatially limited set can be classified in the following manner:
a) simple, if both occur at least once in the same set;
b) likely, where the level of concurrence must be more than a single coincidence and more probable than a concurrence produced by mere chance; and
c) statistically probable, that is, in cases where samples of events are the subject of analysis, a confidence interval should be established to determine the statistical meaning of the combination of events.

This mode of analysis can be applied to the exploratory analysis of questionnaires, the study of textual networks, the review of the content of databases, and the comparison of different statistical analysis of interdependence because the following techniques can be used with the same data: multidimensional scaling, principal component analysis, correspondence analysis, biplot representations, agglomeration techniques, and network analysis algorithms.

The statistical bases of this analysis are described, as is the program written in Stata (coin) that allows the analysis to be executed. As an example of its use, the photograph albums of the following people who were famous in the early twentieth century are described: Miguel de Unamuno (1864–1936), Rafael Masó (1880–1935), Joaquín Turina (1882–1949), and Antonia Mercé (1890–1936), stage name la Argentina.

Additional information

Demand for drugs for childhood malaria in rural Mozmbique

Elisa Sicuri, Sergio Alonso
Malaria is one of the leading causes of death in Sub-Saharan Africa. Artemisinin-combination therapies (ACTs) are used as first-line drugs for treatment, but their market is far from competitive. Important supply issues include limited availability and low quality, while on the demand side, market failures are more related to the lack of information and accessibility to the treatment.

To estimate the actual willingness-to-pay (WTP) for ACTs among children with malaria in rural Mozambique, researchers conducted a survey among patients at a district hospital. Data collected through the survey were merged with demographic surveillance data and the hospital passive case detection systems in place in the area. A negative binomial (NB) regression was used to identify the determinants of the demand for ACTs.

Results showed that WTP is negatively associated with the number of malaria episodes the child has previously suffered during the same malaria season and with the socio-economic position. Age and occupation of the family head were also positively correlated with the WTP. This study also discussed the appropriateness of using contingent valuation methods for estimating WTP. Respondents stated a higher willing-to-pay than expected, but they revealed a much realistic demand price when asked for ability-to-pay. These results provide evidence that ACT subsidies to the private sector are needed to improve access to malaria treatment in rural Mozambique.

Additional information

Analysis of variations in medical practice using Stata

Cristian Tebé
Agencia de Calidad y Evaluación Sanitaria de Cataluña (AQuAS)
Variations in medical practice are defined as systematic variations (not due to chance) of adjusted rates of clinical procedures for a given level of aggregation of the population. The aim of this talk is to explore and describe variations in different clinical conditions and surgical procedures from a population perspective to offer a richer perspective for the assessment of health services in a complex public health care environment. The basic strategy of analysis is to make comparisons among rates of activity (numerator: hospital admissions) of inhabitants of a territory (denominator: basic health area). Results are presented in tables of standardized rates (using dstdize) and ratios of activity using small-area analysis (calling R from Stata). Most results are presented in maps (using spmap) for better visualization. Variation analysis can be a good monitoring tool for any health system. Published atlases have received attention from both clinical and healthcare audiences.

Automated harmonization of variable names and values from several datasets prior to conducting batch statistical analyses

Xavier Bosch-Capblanc
Centro Suizo para la Salud Internacional, Salud Pública
Data requirements by governments, donors, and the international community to measure health and development achievements have increased in the last decade. Datasets produced in surveys conducted in several countries and years are often combined to analyze time trends and geographical patterns of demographic and health-related indicators. However, because not all datasets have the same structure, variable definitions, and codes, they have to be harmonized prior to submitting them to statistical analyses. Manually searching, renaming, and recoding variables are extremely tedious and prone to errors when the number of datasets and variables are large. This article presents an automated approach to harmonizing variable names across several datasets, which optimizes the search of variables, minimizes manual inputs, and reduces the risk of error.

Three consecutive algorithms are applied iteratively to search for each variable of interest for the analyses in all datasets. The first search (A) captures particular cases that could not be solved in an automated way in the search iterations; the second search (B) is run if search A produced no hits and identifies variables of which the labels contain certain key terms defined by the user. If this search produces no hits, a third one (C) is run to retrieve variables that have been identified in other surveys. For each variable of interest, the outputs of these engines can be the following: 1, a single best matching variable is found; 2, more than one matching variable is found; or 3, no matching variables are found. Output 2 is solved by user judgement. Examples using 4 variables are presented and show that the searches have a 100% sensitivity and specificity after a second iteration.

Additional information

Using Stata features to interpret and visualize regression results with examples for binary models

Isabel Cañette
A lot has been said about presenting and interpreting results from binary models. Policy makers are usually interested in population effects, while health providers are mostly interested in individual predicted effects.

This presentation has two aims. First, I will discuss different measures of interest for these kinds of models, such as probabilities, odds ratios, risk ratios,and marginal effects, and how they relate to each other. Second, I will show different ways to use Stata resources to interpret and present results from regression models in general. These approaches can be useful also in the teaching environment.

Additional information

Paquete de comandos de usuarios para Estadística y Epidemiología
(Package of commands for statistics and epidemiology users)

Josep M. Domenech-Massons, Roberto Sesma-Morales
Universidad Autónoma de Barcelona, Laboratorio de Estadística
A lo largo de varias décadas los estudios de postgrado en “Diseño y Estadística en Ciencias de la Salud” han impartido la docencia con SPSS Statistics, lo que comportó programar una serie de Macros y Scripts que implementaban los análisis necesarios para docencia e investigación no disponibles en dicho paquete.

Recientemente hemos finalizado la reconversión de todos los cursos a Stata y transformado los macros y scripts SPSS que realizan procedimientos no disponibles en Stata en comandos de usuario con sus correspondientes cuadros de diálogo y versiones inmediatas.

Additional information

A formal methodology for the comparison of results from different software packages: A case study of estimation of Hosmer–Lemeshow “deciles of risk” for a logistic regression with Stata and with a custom Java program

Alexander Zlotnik, Juan Manuel Montero
Universidad Politécnica de Madrid, Dpto. Ingeniería Electrónica
Ascensión Gallardo-Antolín
Universidad Carlos III de Madrid. Dpto. Teoría de la Señal y Comunicaciones
Statistical software packages are frequently developed in general-purpose programming languages (such as Java, C, and C++) that do not include statistical operations in their core libraries. Software developers are therefore forced to create their own statistical subroutines, use third-party libraries, or follow a hybrid approach. This produces a fairly rich variety of implementations even for the simplest operations, such as the estimation of percentiles. In most cases, there is no gold standard and different approaches, which may yield different results with identical inputs, are acceptable. These differences are often not obvious and usually not documented, and references to alternative approaches are most often omitted. Although this is widely known by people with some experience in statistical software development, most users of statistical software ignore these subtle differences and may spend considerable time comparing results from seemingly identical operations in different software packages. This may become especially daunting when this comparison is made between custom-developed software and an industry-standard statistical software package, such as Stata.

In this presentation, we explain a formal methodology for the comparison of final and partial results of statistical operations between Stata and other software packages. As a case study, we discuss the differences between the calculation of logistic regression coefficients, Hosmer–Lemeshow “deciles of risk”, and null hypothesis testing for the comparison between observed an expected deciles performed with Stata and a custom-developed Java program.

Additional information

Integration between Stata and LaTex to create hospital reports for the Catalan arthroplasty register (RACat): Summary results for the period 2005–2013

Marcela Marinelli, Cristian Tebé
Agencia de Calidad y Evaluación Sanitaria de Cataluña (AQuAS)
The Catalan arthroplasty register (RACat) produces annual clinical reports per center (52 hospitals). These reports are typically generated manually by some analysts. Stata and LaTeX integration permits the automation of such reports. LaTeX can directly execute a Stata do-file that uses different commands. The aim of the present study was to produce automatic reports in an integrated STATA–LaTeX system using the foreach command to generate the structure of the 52 hospital reports; the listtex and tabout commands to produce tables of the hospital characteristics of the operated patients, types of surgical procedures, and prostheses survival at 1, 3, and 5 years after primary surgery; and the graph2tex command to generate a LaTeX graph code to be included in a LaTeX file.

Hospital risks of revision of the implant following knee and hip arthroplasty were measured considering Fine and Gray’s model and using stcrreg (death as competing event) and were compared using funnel plot graphs. Stata–LaTeX integration permits a dynamic do-file and saves a lot of time when changes in the analysis are necessary.

Additional information

Cutpoint determination in continous predictive variables in survival analysis

Santiago Pérez-Hoyos
Instituto de Investigación Vall d’Hebrón, Unidad de Bioestadística y Bioinformática
In survival analysis involving data from clinical or epidemiological studies, increasing interest is given to transforming a continous variable into a categorical one, usually binary. The main objective of this transformation is to build a predictive score of a follow-up event. We present a combination of stata and adhoc functions based on profile likelihood comparisons. Results are presented in html format, including a top-ten cutpoint, an optimal cutpoint, a Kaplan–Meier estimation in graphical and list output, a likelihood and Hazar ratio profile, and a Cox regression model. Results are compared with those obtained by R library maxstat in real data examples. Changing some initial parameters, users can extend the process to other regression models.

Additional information

Scientific organizers

Llorenç Quinto, Barcelona Centre for International Health Research (CRESIB)

Sergi Sanz, Barcelona Centre for International Health Research (CRESIB)

Sergio Alonso, Barcelona Centre for International Health Research (CRESIB)

Elisa Sicuri, Barcelona Centre for International Health Research (CRESIB)

Logistics organizers

Timberlake Consulting S.L., the official distributor of Stata in Spain.