Abstracts
Mixed models: A simulation approach
(Español)
Isabel Cañette
StataCorp
Simulating data is a powerful tool for understanding statistical models and
for spotting identification problems. I will use simulation techniques to
explain the building blocks for linear mixed models, and I will also show how
to estimate the parameters by using the xtmixed command. Using these
basic blocks, I will explain how more complex models can be constructed.
Finally, I will explain some helpful (but not obvious) applications of
xtmixed.
Additional materials:
Canette.pdf
Diagnostic tests for count-data models
Miguel Manjón
Universitat Rovira i Virgili
In this presentation, I discuss the implementation of the chi-square
diagnostic test of Andrews (1988a,b) in count-data models as a Stata
postestimation command. The new command estat chisqdt reports the test
statistic and its p-value. In particular, estat chisqdt can be
used right after the poisson, nbreg, zip, and
zinb commands.
References:
Andrews, D. W. K. 1988a. Chi-square diagnostic tests for econometric models:
Introduction and applications. Journal of Econometrics 37:
135–156.
Andrews, D. W. K. 1988b. Chi-square diagnostic tests for econometric models:
Theory. Econometrica 56: 1419–1453.
Additional materials:
Majon_Martinez.pdf
Rearranging Stata’s output for the analysis of epidemiological tables
Aurelio Tobias
Institute of Environmental Assessment and Water Research (IDAEA), Spanish Council for Scientific Research (CSIC)
A major capability in Stata is the analysis of epidemiological tables by
using any of the epitab commands. These report measures of frequency
(proportion or odds), association (risk difference, relative risk, or odds
ratio) and impact on public health (attributable risks). Furthermore, when
using any of the epitab commands with the by() option, one can
run stratified analysis reporting specific stratum measures of
association with a test of homogeneity, as well as the crude and the
adjusted estimates. These reported figures allow epidemiologists to test
for effect modification and to control for confounding. However, after
many years of teaching epidemiological data analysis with Stata, I have noticed
that students are still being confused about how Stata reports stratified
analyses. In this presentation, I suggest an alternative arrangement of
Stata’s output for the epitab commands used with the
by() option. This arrangement would allow students to easily understand
the main concepts of effect modification and confounding.
Additional materials:
Tobias.pdf
Evaluation of a health promotion intervention to improve maternal health in rural Nepal
Elisa Sicuri
with S. Sharma, J. Belizan, Evan Teijlingen, Padam Simkhada, and Jane Stephens
CRESIB-Hospital Clínic, Universitat de Barcelona
Background: Most maternal deaths occur in developing countries, and of
those, most take place at home. In 2012, Nepal had a maternal mortality ratio
of 170 women per 100,000 live births. A lack of understanding of local
beliefs and practices can hinder the development of appropriate
interventions. Green Tara Nepal (GTN), a Nepalese nongovernmental
organization, designed a health promotion intervention to improve maternal
and neonatal health. The GTN program works with women in fertile age (between
the ages of 15 and 49 with children younger than 2 years old) and with the
people (mothers-in-law and husbands) who influence their ability to access
health services. The GTN intervention evaluated in this study aimed to
improve the uptake of maternal care practices, specifically antenatal and
delivery care, in rural Nepal through health promotion in the community. The
expectation is that health-seeking behavior during and after pregnancy should
improve in the intervention area (Pharping, south of Kathmandu) relative to
the control area (Sankhu, north of Kathmandu).
Methods: This is a controlled before-and-after, cross-sectional,
nonrandomized study. Eight hundred thirty-three women of childbearing age
were interviewed in four village development communities included in one
survey in 2008 (baseline) and one in 2010 (midterm evaluation). A third
survey is currently taking place (final evaluation). Two of the four villages
were used as control communities. Descriptive analysis measured several
demographic, cultural, and socioeconomic characteristics, such as caste and
assets owned. Preliminary analysis measured the impact of the intervention
on the following outcomes: antenatal clinic (ANC) attendance at least once
during the whole pregnancy and during the first trimester and the total
number of ANC visits. Difference-in-difference estimation was used to assess
the effects of intervention on the outcome variables while controlling for a
constructed wealth index and other personal characteristics such as parity,
age, and level of education.
Results: Baseline characteristics were not statistically different
between intervention and control groups. Logistic regression results showed
that the probability of attending an ANC at least once during the whole
pregnancy was six times higher in the intervention group than in the control
group. The impact of the intervention on ANC attendance during the first
trimester was not significant. Poisson regression results showed that women
receiving the intervention attended 1.13 times as many ANC visits as women in
the control group.
Conclusion: Although the impact was not significant during the first
trimester, preliminary results showed the intervention is effective on
antenatal attendance at least once during the whole pregnancy with an
increase in the number of visits. Further analysis will explore the impact
of the intervention on other outcomes and will estimate its
effectiveness in terms of intervention costs per potential
disability-adjusted life years averted.
Additional materials:
Sharma_Sicure.ppt
A simple regression model for the policy effect identification using alternative diff-in-diff assumptions
Ricardo Mora
Universidad Carlos III
Diff-in-diff estimators are widely used in empirical research in economics.
The core assumption to identify the treatment effect is that the average
change in outcome for the treated in the absence of treatment equals the
average change in outcome for the nontreated. In this presentation, I argue
that an important step in the modeling strategy is usually sparsely
discussed: that of whether the outcome variable should be measured in levels,
changes, or a higher-order difference of the original variable in a typical
application with more than two periods. How many differences are taken on the
original variable of interest before applying the diff-in-diff assumption
will lead to alternative identification conditions of the policy effect. I
propose a simple regression model that allows for the estimation of the
policy effect under alternative diff-in-diff assumptions. Additionally, I
show how this model can be used to test the robustness of the policy effect
estimation under alternative assumptions. I illustrate the usefulness of the
approach by revising the results of several recent papers in which the
diff-in-diff technique has been applied.
Additional materials:
Mora_Reggio.pdf
The use of multiple-imputation methods to predict electoral outcomes
Modesto Escobar
Universidad de Salamanca
Antonio Jaime
Universidad de Málaga
One of the recent additions to Stata is the mi suite, which allows us
to perform multiple imputation of missing data with application to a wide
variety of fields, including electoral research. The main goal of this
presentation is to present different methods of electoral prediction by using
survey data. I will estimate electoral results (votes and seats), combining
imputation methods and poststratification techniques. I will focus on the
recent Spanish electoral history to calibrate the models. Data come from the
pre-electoral surveys carried out by the CIS in every election since 1979.
The sample size in each survey allows the production of estimates both at the
national and at the district level. After combining information from these
surveys, I produce a database containing many thousands of predictions, which
are the result of combining 10 elections, 52 districts, 5 parties for each
election on average, and more than 50 prediction methods. These methods
include selection of likely voters, poststratification at different levels,
and various imputation techniques contained in the Stata mi suite.
Results obtained from the previous predictions will be compared with real
outcomes of elections to assess the capabilities of each method to produce
accurate electoral predictions.
Additional materials:
Escobar_Jaime.pdf
Cointegrating VAR models and probability forecasting in Stata
(Español)
Gustavo Sánchez
StataCorp
In this presentation, I discuss two applications of the vec commands.
First, I use the cointegrating VAR approach discussed in Garratt et al.
(2006) to fit a vector error-correction model. In contrast with the
application of the traditional Johansen statistical restrictions for the
identification of the coefficients of the cointegrating vectors, I use Stata
to show an alternative specification of those restrictions based on the
theoretical framework for the long-run cointegrating relationships. Second,
I apply probability forecasting to simulate probability distributions for the
forecasted periods. This approach produces probabilities for future single
and joint events instead of only producing point forecasts and confidence
intervals. For example, we could estimate the joint probability of two-digit
inflation combined with a decrease in the GDP.
Reference:
Garratt, A., K. Lee, M. H. Pesaran, and Y. Shin. 2006. Global and National
Macroeconometric Modelling: A Long-Run Structural Approach. Oxford:
Oxford University Press.
Additional materials:
Sanchez.pdf
Student graduation: To what extent does university expenditure matter?
Javier García Estévez
Universitat de Barcelona
Human capital is one of the most important channels through which
universities positively affect economic development. In addition, graduation
rates remain one of the most frequently applied measures of institutional
performance. In this presentation, I analyze the relationship between
university characteristics and graduation rates. I assemble a dataset for the
entire public university system in Spain over the last decade. Observing the
same university over several years helps to address the problem of unobserved
heterogeneity. The main findings that we can draw from the results are that
university features such as expenditure, student–teacher ratio, and
financial-aid to students are important in accounting for graduation rates.
Additional materials:
Estevez_Duch.pdf
Spatial econometrics with Stata
Vicente Royuela
AQR-IREA Research Group, Universitat de Barcelona
I briefly introduce the command sppack, recently introduced in
Stata 12. This command shows how to build spatial contact matrices through
several alternatives. Additionally, the command allows for computing spatial
lags of the variables and for estimating spatial autoregressive models and
spatial error models, both through maximum likelihood and generalized method
of moments.
Additional materials:
Royuela.pdf
Empirical evidence on horizontal competition in tax enforcement
Luca Salvadori, José María Durán-Cabré, and Alejandro Esteller-Moré
Universitat de Barcelona and Institut d’Economia de Barcelona, IEB
Tax auditing parameters have been largely overlooked by the literature as
policy-making instruments of any relevance; however, enforcement strategies
are critical elements of the tax burden. In this presentation, we show that
in a federal framework, tax auditing policies can serve as additional tools for
regional interaction. We examine the presence of this interaction by adopting a
spatial econometric approach. We employ a time-space recursive model that
accounts for sluggish adjustment in auditing policies, and we obtain results that
are congruent with standard theory, corroborating the presence of horizontal
competition between regions in their tax auditing policies. We also find that
once regional governments acquire legal power, the opaque competition in
enforcement policies disappears, apparently switching to a more transparent
competition in statutory tax parameters.
Additional materials:
Salvadori_Duran.pdf
Estimating spatial panels with Stata
Gordon Hughes
Edinburgh University
Econometricians have begun to devote more attention to spatial interactions
when carrying out applied econometric studies. In part, this is motivated by
an explicit focus on spatial interactions in policy formulation or market
behavior, but it may also reflect concern about the role of omitted
variables that are or may be spatially correlated.
The classic models of
spatial autocorrelation or spatial error rely upon a predefined matrix of
spatial weights W, which may be derived from an explicit model of
spatial interactions but which, alternatively, could be viewed as a flexible
approximation to an unknown set of spatial links similar to the use of a
translog cost function. With spatial panel data, it is possible, in
principle, to regard W as potentially estimable, though the number of
time periods would have to be large relative to the number of spatial panel
units unless severe restrictions are placed upon the structure of the spatial
interactions. While the estimation of W may be infeasible for most
real data, there is a strong, formal similarity between spatial panel models
and nonspatial panel models in which the variance–covariance matrix of
panel errors is not diagonal. One important variant of this type of model is
the random-coefficient model, in which slope coefficients differ across panel
units so that interest focuses on the mean slope coefficient across panel
units. In certain applications—for example, cross-country
(macro-)economic data—the assumption that reaction coefficients are
identical across panel units is not intuitively plausible. Instead of just
sweeping differences in coefficients into a general error term, the
random-coefficient model allows the analyst to focus on the common component
of responses to changes in the independent variables. At the same time, the
model also allows the analyst to retain the information about the error
structure associated with coefficients that are random across panel units but
constant over time for each panel unit.
At present, Stata’s spatial procedures include a range of user-written
routines designed to deal with cross-sectional spatial data. The
recent release of a set of programs (including spmat, spivreg,
and spreg) written by Drukker, Prucha, and Raciborski provides
Stata’s users with the opportunity to fit a wide range of standard
spatial econometric models for cross-sectional data. Extending such
procedures to deal with panel data is nontrivial, in part because there are
important issues about how panels with incomplete data should be treated. The
casewise exclusion of missing data is automatic for cross-sectional data, but
omitting a whole panel unit because some of the data in the panel are missing
will typically lead to a very large reduction in the size of the working
dataset. For example, it is very rare for international datasets on
macroeconomic or other data to be complete, so casewise exclusion of missing
data will generate datasets that contain many fewer countries or time periods
than might otherwise be usable.
The theoretical literature on econometric models for the analysis of spatial
panels has flourished in the last decade with notable contributions from
LeSage and Pace, Elhorst, and Pfaffermayr, among others. In some cases,
authors have made available specific code for the implementation of the
techniques that they have developed. However, the programming language of
choice for such methods has been MATLAB, which is expensive and has a fairly
steep learning curve for nonusers. Many of the procedures assume that there
are no missing data. In addition, the procedures may not be able to handle
large datasets, because the model specifications can easily become
unmanageable if either N (the number of spatial units) or T
(the number of time periods) becomes large.
In this presentation, I will cover a set of user-written maximum likelihood
procedures for fitting models with a variety of spatial structures, including
the spatial error model, the spatial Durbin model, the spatial
autocorrelation model, and certain combinations of these models (the
terminology is attributable to LeSage and Pace [2009]). A suite of MATLAB
programs to fit these models for both random and fixed effects has been
compiled by Elhorst (2010) and provides the basis for the implementation in
Stata/Mata. Methods of dealing with missing data, including the
implementation of an approach proposed by Pfaffermayr (2009), will be
discussed.
A second aspect of spatial panel models that will be covered in the
presentation concerns the links between such models and random-coefficient
models that can be fit using procedures such as xtrc or the
user-written procedure xtmg. The classic formulation of
random-coefficient models assumes that the variance–covariance model of
panel errors is diagonal but heteroskedastic. This is an implausible
assumption for most cross-country datasets, so it is important to consider
how it may be relaxed, either by allowing for explicit spatial interactions
or by using a consistent estimator of the cross-country
variance–covariance model.
The user-written procedures introduced in the presentation will be
illustrated by analyses of (a) state data on electricity consumption in the
U.S., and (b) country data on demand for infrastructure in the developing and
developed world.
References:
Elhorst, J. P. 2010. Spatial panel data models. In Handbook of Applied
Spatial Analysis, ed. M. M. Fischer and A. Getis, 377–407. Berlin:
Springer.
Le Sage, J., and R. Pace. 2009. A sampling approach to estimate the log
determinant used in spatial likelihood problems. Journal of Geographical
Systems 11: 209–225.
Pfaffermayr, M. 2009. Maximum likelihood estimation of a general unbalanced
spatial random effects model: A Monte Carlo study. Spatial Economic
Analysis 4: 467–483.
Additional materials:
Hughes.pdf
|
Meetings
Stata Conference
User Group meetings
Proceedings
|