2012 Spanish Stata Users Group meeting

12 September 2012

Universitat de Barcelona (UB)
Sala de Juntes - edificio 696
Facultat d’Economia i Empresa
Avda. Diagonal, 696.
Barcelona 08034
Spain

Proceedings

Mixed models: A simulation approach

Isabel Cañette

StataCorp

Simulating data is a powerful tool for understanding statistical models and for spotting identification problems. I will use simulation techniques to explain the building blocks for linear mixed models, and I will also show how to estimate the parameters by using the xtmixed command. Using these basic blocks, I will explain how more complex models can be constructed. Finally, I will explain some helpful (but not obvious) applications of xtmixed.

Additional materials:
Canette.pdf

Diagnostic tests for count-data models

Miguel Manjón

Universitat Rovira i Virgili

In this presentation, I discuss the implementation of the chi-square diagnostic test of Andrews (1988a,b) in count-data models as a Stata postestimation command. The new command estat chisqdt reports the test statistic and its p-value. In particular, estat chisqdt can be used right after the poisson, nbreg, zip, and zinb commands.

References:
Andrews, D. W. K. 1988a. Chi-square diagnostic tests for econometric models: Introduction and applications. Journal of Econometrics 37: 135–156.

Andrews, D. W. K. 1988b. Chi-square diagnostic tests for econometric models: Theory. Econometrica 56: 1419–1453.

Additional materials:
Majon_Martinez.pdf

Rearranging Stata’s output for the analysis of epidemiological tables

Aurelio Tobias

Institute of Environmental Assessment and Water Research (IDAEA), Spanish Council for Scientific Research (CSIC)

A major capability in Stata is the analysis of epidemiological tables by using any of the epitab commands. These report measures of frequency (proportion or odds), association (risk difference, relative risk, or odds ratio) and impact on public health (attributable risks). Furthermore, when using any of the epitab commands with the by() option, one can run stratified analysis reporting specific stratum measures of association with a test of homogeneity, as well as the crude and the adjusted estimates. These reported figures allow epidemiologists to test for effect modification and to control for confounding. However, after many years of teaching epidemiological data analysis with Stata, I have noticed that students are still being confused about how Stata reports stratified analyses. In this presentation, I suggest an alternative arrangement of Stata’s output for the epitab commands used with the by() option. This arrangement would allow students to easily understand the main concepts of effect modification and confounding.

Additional materials:
Tobias.pdf

Evaluation of a health promotion intervention to improve maternal health in rural Nepal

Elisa Sicuri with S. Sharma, J. Belizan, Evan Teijlingen, Padam Simkhada, and Jane Stephens

CRESIB-Hospital Clínic, Universitat de Barcelona

Background: Most maternal deaths occur in developing countries, and of those, most take place at home. In 2012, Nepal had a maternal mortality ratio of 170 women per 100,000 live births. A lack of understanding of local beliefs and practices can hinder the development of appropriate interventions. Green Tara Nepal (GTN), a Nepalese nongovernmental organization, designed a health promotion intervention to improve maternal and neonatal health. The GTN program works with women in fertile age (between the ages of 15 and 49 with children younger than 2 years old) and with the people (mothers-in-law and husbands) who influence their ability to access health services. The GTN intervention evaluated in this study aimed to improve the uptake of maternal care practices, specifically antenatal and delivery care, in rural Nepal through health promotion in the community. The expectation is that health-seeking behavior during and after pregnancy should improve in the intervention area (Pharping, south of Kathmandu) relative to the control area (Sankhu, north of Kathmandu).

Methods: This is a controlled before-and-after, cross-sectional, nonrandomized study. Eight hundred thirty-three women of childbearing age were interviewed in four village development communities included in one survey in 2008 (baseline) and one in 2010 (midterm evaluation). A third survey is currently taking place (final evaluation). Two of the four villages were used as control communities. Descriptive analysis measured several demographic, cultural, and socioeconomic characteristics, such as caste and assets owned. Preliminary analysis measured the impact of the intervention on the following outcomes: antenatal clinic (ANC) attendance at least once during the whole pregnancy and during the first trimester and the total number of ANC visits. Difference-in-difference estimation was used to assess the effects of intervention on the outcome variables while controlling for a constructed wealth index and other personal characteristics such as parity, age, and level of education.

Results: Baseline characteristics were not statistically different between intervention and control groups. Logistic regression results showed that the probability of attending an ANC at least once during the whole pregnancy was six times higher in the intervention group than in the control group. The impact of the intervention on ANC attendance during the first trimester was not significant. Poisson regression results showed that women receiving the intervention attended 1.13 times as many ANC visits as women in the control group.

Conclusion: Although the impact was not significant during the first trimester, preliminary results showed the intervention is effective on antenatal attendance at least once during the whole pregnancy with an increase in the number of visits. Further analysis will explore the impact of the intervention on other outcomes and will estimate its effectiveness in terms of intervention costs per potential disability-adjusted life years averted.

Additional materials:
Sharma_Sicure.ppt

A simple regression model for the policy effect identification using alternative diff-in-diff assumptions

Ricardo Mora

Universidad Carlos III

Diff-in-diff estimators are widely used in empirical research in economics. The core assumption to identify the treatment effect is that the average change in outcome for the treated in the absence of treatment equals the average change in outcome for the nontreated. In this presentation, I argue that an important step in the modeling strategy is usually sparsely discussed: that of whether the outcome variable should be measured in levels, changes, or a higher-order difference of the original variable in a typical application with more than two periods. How many differences are taken on the original variable of interest before applying the diff-in-diff assumption will lead to alternative identification conditions of the policy effect. I propose a simple regression model that allows for the estimation of the policy effect under alternative diff-in-diff assumptions. Additionally, I show how this model can be used to test the robustness of the policy effect estimation under alternative assumptions. I illustrate the usefulness of the approach by revising the results of several recent papers in which the diff-in-diff technique has been applied.

Additional materials:
Mora_Reggio.pdf

The use of multiple-imputation methods to predict electoral outcomes

Modesto Escobar

Universidad de Salamanca

Antonio Jaime

Universidad de Málaga

One of the recent additions to Stata is the mi suite, which allows us to perform multiple imputation of missing data with application to a wide variety of fields, including electoral research. The main goal of this presentation is to present different methods of electoral prediction by using survey data. I will estimate electoral results (votes and seats), combining imputation methods and poststratification techniques. I will focus on the recent Spanish electoral history to calibrate the models. Data come from the pre-electoral surveys carried out by the CIS in every election since 1979. The sample size in each survey allows the production of estimates both at the national and at the district level. After combining information from these surveys, I produce a database containing many thousands of predictions, which are the result of combining 10 elections, 52 districts, 5 parties for each election on average, and more than 50 prediction methods. These methods include selection of likely voters, poststratification at different levels, and various imputation techniques contained in the Stata mi suite. Results obtained from the previous predictions will be compared with real outcomes of elections to assess the capabilities of each method to produce accurate electoral predictions.

Additional materials:
Escobar_Jaime.pdf

Cointegrating VAR models and probability forecasting in Stata

Gustavo Sánchez

StataCorp

In this presentation, I discuss two applications of the vec commands. First, I use the cointegrating VAR approach discussed in Garratt et al. (2006) to fit a vector error-correction model. In contrast with the application of the traditional Johansen statistical restrictions for the identification of the coefficients of the cointegrating vectors, I use Stata to show an alternative specification of those restrictions based on the theoretical framework for the long-run cointegrating relationships. Second, I apply probability forecasting to simulate probability distributions for the forecasted periods. This approach produces probabilities for future single and joint events instead of only producing point forecasts and confidence intervals. For example, we could estimate the joint probability of two-digit inflation combined with a decrease in the GDP.

Reference:
Garratt, A., K. Lee, M. H. Pesaran, and Y. Shin. 2006. Global and National Macroeconometric Modelling: A Long-Run Structural Approach. Oxford: Oxford University Press.

Additional materials:
Sanchez.pdf

Student graduation: To what extent does university expenditure matter?

Javier García Estévez

Universitat de Barcelona

Human capital is one of the most important channels through which universities positively affect economic development. In addition, graduation rates remain one of the most frequently applied measures of institutional performance. In this presentation, I analyze the relationship between university characteristics and graduation rates. I assemble a dataset for the entire public university system in Spain over the last decade. Observing the same university over several years helps to address the problem of unobserved heterogeneity. The main findings that we can draw from the results are that university features such as expenditure, student–teacher ratio, and financial-aid to students are important in accounting for graduation rates.

Additional materials:
Estevez_Duch.pdf

Spatial econometrics with Stata

Vicente Royuela

AQR-IREA Research Group, Universitat de Barcelona

I briefly introduce the command sppack, recently introduced in Stata 12. This command shows how to build spatial contact matrices through several alternatives. Additionally, the command allows for computing spatial lags of the variables and for estimating spatial autoregressive models and spatial error models, both through maximum likelihood and generalized method of moments.

Additional materials:
Royuela.pdf

Empirical evidence on horizontal competition in tax enforcement

Luca Salvadori, José María Durán-Cabré, and Alejandro Esteller-Moré

Universitat de Barcelona and Institut d’Economia de Barcelona, IEB

Tax auditing parameters have been largely overlooked by the literature as policy-making instruments of any relevance; however, enforcement strategies are critical elements of the tax burden. In this presentation, we show that in a federal framework, tax auditing policies can serve as additional tools for regional interaction. We examine the presence of this interaction by adopting a spatial econometric approach. We employ a time-space recursive model that accounts for sluggish adjustment in auditing policies, and we obtain results that are congruent with standard theory, corroborating the presence of horizontal competition between regions in their tax auditing policies. We also find that once regional governments acquire legal power, the opaque competition in enforcement policies disappears, apparently switching to a more transparent competition in statutory tax parameters.

Additional materials:
Salvadori_Duran.pdf

Estimating spatial panels with Stata

Gordon Hughes

Edinburgh University

Econometricians have begun to devote more attention to spatial interactions when carrying out applied econometric studies. In part, this is motivated by an explicit focus on spatial interactions in policy formulation or market behavior, but it may also reflect concern about the role of omitted variables that are or may be spatially correlated.

The classic models of spatial autocorrelation or spatial error rely upon a predefined matrix of spatial weights W, which may be derived from an explicit model of spatial interactions but which, alternatively, could be viewed as a flexible approximation to an unknown set of spatial links similar to the use of a translog cost function. With spatial panel data, it is possible, in principle, to regard W as potentially estimable, though the number of time periods would have to be large relative to the number of spatial panel units unless severe restrictions are placed upon the structure of the spatial interactions. While the estimation of W may be infeasible for most real data, there is a strong, formal similarity between spatial panel models and nonspatial panel models in which the variance–covariance matrix of panel errors is not diagonal. One important variant of this type of model is the random-coefficient model, in which slope coefficients differ across panel units so that interest focuses on the mean slope coefficient across panel units. In certain applications—for example, cross-country (macro-)economic data—the assumption that reaction coefficients are identical across panel units is not intuitively plausible. Instead of just sweeping differences in coefficients into a general error term, the random-coefficient model allows the analyst to focus on the common component of responses to changes in the independent variables. At the same time, the model also allows the analyst to retain the information about the error structure associated with coefficients that are random across panel units but constant over time for each panel unit.

At present, Stata’s spatial procedures include a range of user-written routines designed to deal with cross-sectional spatial data. The recent release of a set of programs (including spmat, spivreg, and spreg) written by Drukker, Prucha, and Raciborski provides Stata’s users with the opportunity to fit a wide range of standard spatial econometric models for cross-sectional data. Extending such procedures to deal with panel data is nontrivial, in part because there are important issues about how panels with incomplete data should be treated. The casewise exclusion of missing data is automatic for cross-sectional data, but omitting a whole panel unit because some of the data in the panel are missing will typically lead to a very large reduction in the size of the working dataset. For example, it is very rare for international datasets on macroeconomic or other data to be complete, so casewise exclusion of missing data will generate datasets that contain many fewer countries or time periods than might otherwise be usable.

The theoretical literature on econometric models for the analysis of spatial panels has flourished in the last decade with notable contributions from LeSage and Pace, Elhorst, and Pfaffermayr, among others. In some cases, authors have made available specific code for the implementation of the techniques that they have developed. However, the programming language of choice for such methods has been MATLAB, which is expensive and has a fairly steep learning curve for nonusers. Many of the procedures assume that there are no missing data. In addition, the procedures may not be able to handle large datasets, because the model specifications can easily become unmanageable if either N (the number of spatial units) or T (the number of time periods) becomes large.

In this presentation, I will cover a set of user-written maximum likelihood procedures for fitting models with a variety of spatial structures, including the spatial error model, the spatial Durbin model, the spatial autocorrelation model, and certain combinations of these models (the terminology is attributable to LeSage and Pace [2009]). A suite of MATLAB programs to fit these models for both random and fixed effects has been compiled by Elhorst (2010) and provides the basis for the implementation in Stata/Mata. Methods of dealing with missing data, including the implementation of an approach proposed by Pfaffermayr (2009), will be discussed.

A second aspect of spatial panel models that will be covered in the presentation concerns the links between such models and random-coefficient models that can be fit using procedures such as xtrc or the user-written procedure xtmg. The classic formulation of random-coefficient models assumes that the variance–covariance model of panel errors is diagonal but heteroskedastic. This is an implausible assumption for most cross-country datasets, so it is important to consider how it may be relaxed, either by allowing for explicit spatial interactions or by using a consistent estimator of the cross-country variance–covariance model.

The user-written procedures introduced in the presentation will be illustrated by analyses of (a) state data on electricity consumption in the U.S., and (b) country data on demand for infrastructure in the developing and developed world.

References:
Elhorst, J. P. 2010. Spatial panel data models. In Handbook of Applied Spatial Analysis, ed. M. M. Fischer and A. Getis, 377–407. Berlin: Springer.

Le Sage, J., and R. Pace. 2009. A sampling approach to estimate the log determinant used in spatial likelihood problems. Journal of Geographical Systems 11: 209–225.

Pfaffermayr, M. 2009. Maximum likelihood estimation of a general unbalanced spatial random effects model: A Monte Carlo study. Spatial Economic Analysis 4: 467–483.

Additional materials:
Hughes.pdf

Scientific organizers

Llorenç Quinto, Hospital Clínico UB
Sergi Sanz, Hospital Clínico UB
Raúl Ramos, AQR-IREA, Universitat de Barcelona
Vicente Royuela, AQR-IREA, Universitat de Barcelona

Logistics organizers

Timberlake Consulting S.L., the official distributor of Stata in Spain.