Home  /  Support  /  Users Group meetings  /  2015 Spain

2015 Spanish Stata Users Group meeting

22 October 2015

Puerta de Europa

Instituto de Empresa
Calle de María de Molina, 13
28006 Madrid
Spain

Proceedings


Revisiting generalized method of moments

Enrique Pinzon
StataCorp
The generalized method of moments (GMM) estimator, an economist's favorite, was introduced in Stata 11. GMM is useful in many other disciplines, however, and we have used it extensively in the treatment-effects commands released in Stata 13 and Stata 14. I will briefly discuss some relevant properties of GMM and then show how it is used in treatment-effects estimation. I will conclude with a simple application of GMM that is new in the literature.

Additional information
spain15_pinzon.pdf

A low CD4/CD8 ratio during effective ART predicts immunosenescence and morbidity/mortality

Sergio Serrano-Villar
University Hospital Ramón Cajal
Santiago Moreno
University Hospital Ramón Cajal
Talia Sainz
University Hospital La Paz
April L. Ferre
University of California, Davis
Sulggi A. Lee
University of California, San Francisco
Peter W. Hunt
University of California, San Francisco
Elizabeth Sinclair
University of California, San Francisco
Vivek Jain
University of California, San Francisco
Frederick M. Hecht
University of California, San Francisco
Steven G. Deeks
University of California, San Francisco
A low CD4/CD8 ratio in elderly HIV-uninfected adults is associated with increased mortality. A subset of HIV-infected adults receiving effective antiretroviral therapy (ART) fails to normalize this ratio, even after they achieve normal CD4+ T-cell counts. The immunologic and clinical characteristics of this clinical remain undefined. Using data from four distinct clinical cohorts, we show that a low CD4/CD8 ratio in HIV-infected adults during otherwise effective ART (CD4+ T-cell counts >500 cells/mm3) is associated with a number of immunological abnormalities. Longitudinal changes in CD4+ and CD8+ T-cell counts and in the CD4/CD8 ratio were assessed using linear mixed models with random intercepts. Age, gender, and pre-ART CD4+ T-cell count were included in multivariate analyses as fixed effects. Interaction terms were created to assess whether these changes over time differed significantly between the early and later ART initiators. Changes in slopes before and after ART time points were assessed using linear splines. Individuals who initiated ART within 6 months of infection had greater CD4/CD8 ratio increase compared with later initiators (>2 years). Conditional logistic regression analysis showed that a low CD4/CD8 ratio predicted higher risk on morbidity and mortality. Hence, this clinically accessible measurement may prove useful in monitoring response to ART and could identify a unique subset of individuals in need of novel therapeutic interventions.

Additional information
spain15_serrano.pdf

Assessing convergent and discriminant validity in the ADHD-R IV rating scale: User-written commands for average variance extracted (AVE), composite reliability (CR), and heterotrait-monotrait ratio of correlations (HTMT)

David Alarcón Rubio
Universidad Pablo de Olavide
José Antonio Sánchez Medina
Universidad Pablo de Olavide
Convergent and discriminant validity examines the extent to which a latent variable is different from others in a variance-based SEM. The criterion of Fornell-Larcker (1981) has been commonly used to assess the degree of shared variance between the latent variables of the model. According to this criterion, convergent validity can be assessed by composite reliability (CR) and average variance extracted (AVE). CR is a less biased estimate of reliability than Chonbach's alpha; the acceptable value of CR is 0.7 and above. AVE measures the level of variance captured by a construct versus the level due to measurement error; values above 0.7 are considered very good, whereas a level of 0.5 is acceptable. Discriminant validity is assessed by comparing AVE and the squared correlation between two constructs. The level of square root of AVE should be greater than the correlations involving the constructs. Recently, the heterotrait-monotrait ratio of the correlations (HTMT) approach has been proposed to assess discriminant validity. HTMT is the average of the heterotrait-heteromethod correlations relative to the average of the monotrait-heteromethod correlations. The present work presents a series of user-written commands to obtain these indicators of convergent and discriminant validity for confirmatory factor-analysis models and to calculate their confidence intervals using the bootstrap method. To demonstrate the use of these commands, we use data from a sample of high school students who have been administered the ADHD-R IV rating scale.

Additional information
spain15_alarcon.pdf

Differences in perinatal health among immigrant and native-origin children: Evidence from differentials in weight at birth in Spain

Hector Cebolla-Boado
Universidad Nacional de Educación Distancia
Leire Salazar
Universidad Nacional de Educación Distancia
This presentation explores differences in perinatal inequality between migrants and natives in Spain and, more specifically, differences in the weight at birth. In line with the logic of the "healthy immigrant paradox", the children of immigrant mothers are known for having a lower risk of low weight at birth (LBW; <2,500). Using the universe of births in Spain in 2013 (excluding preterm and multiple births), we go beyond the standard approach of using a dichotomous variable for estimating the risk of LBW.

Using Stata, we estimate quantile regression to explore migrant-native differentials in weight at birth across the range of observed values and also concentrate on the impact of migrant status among babies weighing above 4,000 grams, a threshold that, similarly to LBW, is associated with certain pathological characteristics and a problematic future development.

Our research not only confirms that the well-known epidemiological regularity of healthier babies among migrants in advanced democracies also applies to Spain, namely, an advantage of immigrant-origin babies in terms of avoiding LBW, but also confirms that in the other extreme, when the baby's weight is above 4,000 grams, migrant-origin babies weigh over 110 grams more than native-origin ones. In sum, we contribute to the literature by showing that the higher average weight of newly born babies from immigrant mothers is not always a source of perinatal advantage.

aries: An implementation of CART in Stata

Ricardo Mora
Universidad Carlos III de Madrid
Tree-structured models use two-dimensional binary trees as a predictive model. Tree models where the target variable can take a finite set of values are called classification trees. Decision trees where the target variable can take continuous values (typically real numbers) are called regression trees. Estimation of the tree is trivial in both classification and in regression trees if the structure of the tree is known. Otherwise, several algorithms have been proposed, and several software packages implement these algorithms, notably the classification and regression trees (CART) algorithm by Breiman et al (1984) (that is, Salford Systems CART, Matlab, and R). In Stata, the module cart, developed by Wim van Putten, performs a CART analysis but only for failure time data. In this presentation, I discuss a new module, aries, that performs the basic CART algorithm for both binary and continuous dependent variables.

Additional information
spain15_mora.pdf

Stata web services: Toward Stata-based healthcare informatics applications integrated in a service-oriented architecture (SOA)

Alexander Zlotnik
Technical University of Madrid
University Hospital Ramón y Cajal
Modesto Escobar
Universidad de Salamanca
Ascensión Gallardo-Antolín
Universidad Carlos III de Madrid
Juan Manuel Montero Martínez
Technical University of Madrid
Stata has many functions that can be used in decision support systems, forecasting systems, and, generally, applications that use analytical or modeling functionalities. A web interface with an HTML/JS graphical user interface or an XML-based web service are convenient approaches for exposing Stata-based programs on public and private computer networks. However, using Stata through a web interface or integrating it into a corporate software environment such as a service-oriented architecture can be challenging. Usually, Stata-based programs need to be translated (reimplemented) in a different programming language to be used through the aforementioned interfaces. These reimplementations can be problematic, time consuming, and error prone.

We describe an approach for using Stata-based applications directly through a web interface, the requirements for such applications, and the limitations of this approach. We then discuss modern software engineering solutions for software integration scenarios in healthcare informatics and potential use for Stata-based decision support systems in this field.



Additional information
spain15_zlotnik.pdf

Introduction to Markov-switching regression models using the mswitch command

Gustavo Sánchez
StataCorp
A considerable number of time series can be characterized by data-generating processes (DGP) that may be affected by particular events that lead to changes in the parameters. The new conditions for the DGP may remain in place for a period of time until the change is reversed to the previous state or until a new event leads to a new state, with the corresponding change in the parameters. In Stata 14, we introduce the mswitch command to model those kinds of time series by characterizing the transitions between unobserved states with a Markov chain. I will briefly introduce the basic concepts of Markov-switching models, and I will use a couple of examples to illustrate the implementation provided by mswitch.

Additional information
spain15_sanchez.pdf

Modeling multilevel data: The estimated dependent variable approach

Antonio M. Jaime-Castillo
Universidad de Málaga
Multilevel data have become very popular in the social sciences. Several international research projects (such as the European Social Survey, the International Social Survey Programme, and the World Value Survey) have produced a large amount of comparative data in recent decades. The dominant approach to analyze multilevel data structures uses multilevel models (a mixture of fixed and random effects), and major statistical packages have incorporated routines for estimating these kinds of models. This analytical strategy has several advantages over most naïve pooling strategies. However, it also has some drawbacks on both theoretical and practical grounds. The statistical theory behind multilevel models is still under development, and the computational burden to estimate nonlinear models, as well as convergence issues, can be challenging in some cases. An alternative is the estimated dependent variable (EDV) approach, in which the researcher estimates a separate model for individual variables in each level 2 unit in the first step. In the second step, the estimated coefficients in the first step become the dependent variables to be explained by a set of aggregate predictors. In this presentation, I focus on the potential applications of this approach using Stata.

Additional information
spain15_jaime.pdf

A simple procedure to correct for measurement errors in survey research

Anna DeCastellarnau
Universitat Pompeu Fabra
Although there is much literature on the existence of measurement errors, few researchers are correcting them in their analyses. In this presentation, I will show that correction for measurement errors in survey research is not only necessary but also possible and actually rather simple. Using the quality estimates obtained from the free online software Survey Quality Predictor (SQP), one can easily correct and use correlation and covariance matrices as input for your analysis. This procedure was described for Stata, LISREL, and R in the ESS EduNet module "A simple procedure to correct for measurement errors in survey research". This presentation will focus on the correction of measurement errors in regression analysis and causal models using Stata.

Additional information
spain15_decastellarnau.pdf

Content analysis with Stata

Modesto Escobar
Universidad de Salamanca
José L. Alonso Berrocal
Universidad de Salamanca
Content analysis is a technique used in the social sciences for the systematic study of the contents of the communication. In this presentation, we discuss a couple of useful programs for statistical analysis of texts. The first (precoin) splits the text into words or groups of words to form an incidence matrix. The second (coin) works with this matrix and produces frequencies, co-occurrences, multivariate statistical measures of centrality and distance, and various types of graphs. We present, as examples of its use, an analysis of a sample of tweets and another analysis of open-ended answers from a questionnaire.

Additional information
spain15_escobar.pdf

Wishes and grumbles

StataCorp
StataCorp staff will be happy to receive wishes for developments in Stata and almost as happy to receive grumbles about the software.

Scientific organizers

Modesto Escobar, Universidad de Salamanca

Alexander Zlotnik, Polytechnic University of Madrid and Hospital Universitario Ramón Cajal

Logistics organizers

Timberlake Consulting S.L., the official distributor of Stata in Spain.