Abstracts
Monday, July 24, 2006
Weak instruments: An overview and new techniques
Austin Nichols
Urban Institute
Abstract
I review existing literature on weak instruments (possibly with multiple
endogenous variables) and the research in progress by Jim Stock and
others. I demonstrate using tests for weak instruments and give a
new graphical technique for presenting coefficient estimates that allows
for hypothesis testing (using Anderson–Rubin-style test statistics)
in the presence of weak instruments.
Additional information
wiv.pdf
How to do xtabond2
David Roodman
Center for Global Development
Abstract
xtabond2 may hold the record among user-written Stata modules for
the most confused users (and perhaps the most-confused too). In this
presentation, I motivate and describe the Arellano–Bond and
Blundell–Bond linear generalized method of moments (GMM) dynamic
panel estimators, drawing lessons from a steady stream of correspondence
with users. I also provide an overview of how to implement them with
xtabond2. I first introduce linear GMM as an extension of ordinary
least squares. Then I describe how limited time span, the potential for
fixed effects, endogeneity, and the dangers of dynamic panel bias all shape
these estimators—in particular, in their use of differences, lags as
instruments, and GMM. I explain how xtabond2 commands should be
constructed, with particular attention to the various options and
suboptions for controlling instrument matrix construction. I discuss the
need to limit the number of instruments and options for doing so.
Additional information
How2Do_xtabond2.ppt
Time-series filtering techniques in Stata
Kit Baum
Boston College and RePEc
Abstract
I will describe several time-series filtering techniques, including the
Hodrick–Prescott, Baxter–King, and bandpass filters and
variants, as well as present new Mata-coded versions of these routines that
are considerably more efficient than previous ado-code routines. I will
also discuss applications to several economic and financial time series.
Additional information
TSFiltering_beamer.pdf
Towards self-contained data: Attaching validation routines to variables
William Rising
Bellarmine University
Abstract
One of Stata’s great strengths is its data management abilities. When
either building or sharing datasets, some of the most time-consuming
activities are validating the data and writing documentation for the data.
Much of this futility could be avoided if datasets were self-contained,
i.e., if they could validate themselves. I will show how to achieve this
goal within Stata.
I will demonstrate a package of commands for attaching validation
rules to the variables themselves, via characteristics, along with commands
for running error checks and marking suspicious observations in the
dataset. The validation system is flexible enough that simple checks
continue to work even if variable names change or if the data are reshaped,
and it is rich enough that validation may depend on other variables in the
dataset. Since the validation is at the variable level, the
self-validation also works if variables are recombined with data from other
datasets. With these tools, Stata’s datasets will become truly
self-contained.
Managing edit checks and database cleaning with Stata
Jacqueline L. Buros
Perfuse Laboratories and Data Coordinating Center
Abstract
We have developed a set of ado-files for use in data management,
specifically designed to manage user-written edit checks and to complement
the process of data cleaning. Collectively, these tools enable us to
identify, distribute, and track edit checks in several large multicenter
clinical trials by using Stata software.
Our approach is successful because the coding is simple and the entire
process is visible and familiar to most users. It does not depend on any
particular database structure. The framework approximates an
object-oriented environment, with the objects being (a) the database, open
at the time a command is called; (b) an edit check, consisting of a Stata
do-file, a query message, and a list of variables to be identified for
review; and (c) the edit-check history, implemented as a Stata dataset.
These objects can be manipulated directly or by using a command in Stata.
Actions managed by command include creating or modifying an edit check,
generating a query-clean dataset, preparing and tracking a set of
edit-check documents, and summarizing the edit-check history. Here I
present a brief overview of our process and describe using
the commands in the context of clinical research.
A diff command for use with data files
Philip Schumm
University of Chicago
Abstract
One of the most important tools in a programmer’s tool chest is the
diff command. This command permits you to determine immediately
whether two code files are identical and, when they are not, to generate a
patch that summarizes the differences and can be used to transform the
first file into the second. In this presentation, I will introduce an
analogous tool written for use with data files. Unlike code files, in
which each line is identified by its physical location within the file,
records in a data file are typically identified by one or more indices,
each composed of one or more distinct variables. Our tool compares two
files on the basis of one or more such indices; provides a compact, readable
summary of the differences; and can generate a patch (in the form of a
do-file) to update the first file on the basis of the second. This tool is
useful during data analysis whenever two or more versions of a data file
are encountered and may also be used by a data coordinating center to
manage repeated data submissions from multiple sites. The program was
developed using Mata, and I will discuss some of the programming
techniques.
Additional information
Schumm.pdf
Tools for estimation of grouped conditional logit models
Paulo Guimarães
Medical University of South Carolina
Abstract
In many applications of conditional logit models, the choice set and the
characteristics of that set are identical for groups of decision makers.
One can then obtain a more computationally efficient estimation of the
model by grouping the data and using a new user-written command,
multin. The command multin is designed for fitting grouped
conditional logit models. It produces the same output as clogit but
requires a more compact data layout, which is particularly relevant when
the model comprises many observations and/or choices. In this situation,
one can substantially reduce the size of the dataset and the time required
for estimation. I also present a tool implemented in Mata that transforms
the data as required by clogit to the new format required by
multin. Finally, I discuss the problem of overdispersion in the
grouped conditional logit model and present some alternatives to deal with
this problem. One of these alternatives is Dirichlet-multinomial (DM)
regression. I present a new command for fitting the DM regression model,
dirmul. The dirmul command can also be used to estimate the
better known beta-binomial regression models.
Additional information
NASUG_Guimaraes.pdf
A simulation-based sensitivity analysis for matching estimators
Tommaso Nannicini
Department of Economics, Universidad Carlos III de Madrid
Abstract
I present a Stata program (sensatt) that implements the
sensitivity analysis for propensity-score matching estimators proposed by
Ichino, Mealli, and Nannicini (2005). The proposed analysis builds on
Rosenbaum and Rubin (1983) and Rosenbaum (1987) and simulates a potential
confounder to assess the robustness of the estimated treatment
effect with respect to specific deviations from the conditional
independence assumption. The program sensatt uses the
user-written Stata commands for propensity-score matching estimation (att*)
developed by Becker and Ichino (2002). An example of the implementation of
the proposed sensitivity analysis is given by using the national supported
work demonstration, widely known in the program evaluation
literature.
Additional information
pres_Stata_1.pdf
sensatt.ado
sensatt.hlp
sensatt_wp.pdf
Using the Bayesian information criterion (BIC) to judge models and statistical significance
Paul Millar
University of Calgary
Abstract
After a short review of the development of the Bayesian information
criterion (Jeffery, Schwartz), I will discuss both its extension by
Raftery for statistical significance (implemented as bic) and a
further, simpler routine (bicdrop1) as preventive methods to
avoid making incorrect inference decisions and as "model mining"
procedures.
Additional information
Millar_BostonBIC.ppt
Graphs for all seasons
Nicholas J. Cox
Durham University
Abstract
Seasonal effects are dominant in many environmental time series and
important or at least notable in many economic or biomedical time series,
to name only a few application areas represented in the Stata user
community. In several fields, using anything other than basic line graphs of
responses versus time to display series showing seasonality seems rare.
The presentation focuses on a variety of minor and major tricks for
graphically examining seasonality, some of which have long histories in
climatology or related sciences but appear little known outside. I will
discuss some original code, but the greater emphasis is on users needing to
know Stata functions and commands well if they are to exploit the full
potential of its graphics.
Additional information
cox.zip
Confirmatory factor analysis in Stata
Stas Kolenikov
University of Missouri, Columbia
Abstract
I will present a set of routines to conduct a one-factor confirmatory
factor analysis in Stata. I will highlight using Mata in programming. I
will demonstrate corrections for nonnormality, common in the structural
equation modeling literature. I will also give indications for further
development into multifactor models and, eventually, structural equation
models.
Additional information
NASUG-Kolenikov-cfa1.pdf
Tuesday, July 25, 2006
Matching methods for estimating treatment effects using Stata
Guido W. Imbens
Harvard University
Abstract
I will give a brief overview of modern statistical methods for estimating
treatment effects that have recently become popular in social and biomedical
sciences. These methods are based on the potential outcome framework
developed by Donald Rubin. The specific methods discussed include regression
methods, matching, and methods involving the propensity score. I will
discuss the assumptions underlying these methods and the methods for
assessing their plausability. I will then discuss using the Stata command
nnmatch to estimate average treatment effects. I will illustrate this
approach by using data from a job training program.
A general survey of these methods can be found in the following:
- Imbens, G. 2004.
- Nonparametric estimation of average treatment effects under exogeneity: A review.
- Review of Economics and Statistics 86: 4–30.
Additional information
Link to full text (MIT Press)
Imbens_stata_06july.pdf
lalonde_nonexper_06july25.smcl
Econometric analysis of time-series data using Stata
David M. Drukker
StataCorp
Abstract
After introducing time-series data management in Stata, the talk discusses
estimation, inference, and interpretation of ARMA models, ARCH/GARCH models,
VAR models, and SVAR models in Stata. The talk briefly introduces each
model discussed.
Group comparisons and other issues in interpreting models for categorical outcomes using Stata
J. Scott Long
Indiana University
Abstract
This presentation examines methods for interpreting regression models for
categorical outcomes using predicted values. The talk begins with a simple
example using basic commands in Stata. It builds on this example to show
how more advanced programming features in Stata along with commands in Long
and Freese's SPost package can be used in more complex applications that
involve plotting predictions. These tools are then applied to the problem
of comparing groups in models for categorical outcomes, focusing on the
binary regression model. Identification issues make commonly used tests
inappropriate since these tests confound the magnitude of the regression
coefficients and the variance of the error. An alternative approach is
proposed based on the comparisons of the predictions across groups. This
approach is illustrated by extending the tools presented in the first part
of the talk.
Estimation and interpretation of measures of inequality, poverty, and
social welfare using Stata
Stephen P. Jenkins
University of Essex
Abstract
This presentation reviews methods for summarizing and comparing income
distributions, together with the related literature about variance
estimation for a range of summary measures. Although the focus is on income
and the perspective is that of an economist, the methods have been widely
applied to other variables, including health-related ones, and by
researchers from many disciplines. Topics covered include the measurement
of inequality, poverty, and social welfare, and distributional comparisons
based on the dominance methods as well as summary indices. Illustrations are
provided using a suite of public-domain Stata programs written by the
author and collaborators (e.g., glcurve, ineqdeco,
povdeco, sumdist, svyatk, svygei,
svylorenz), together with built-in commands.
|
Meetings
Stata Conference
User Group meetings
Proceedings
|