Home  /  Resources & support  /  Users Group meetings  /  2006 North American Stata Users Group meeting

Last updated: 3 August 2006

2006 North American Stata Users Group meeting

24–25 July 2006

Chas River Basin

Longwood Galleria Conference Center
342 Longwood Avenue
Boston, Massachusetts


Weak instruments: An overview and new techniques

Austin Nichols
Urban Institute

I review existing literature on weak instruments (possibly with multiple endogenous variables) and the research in progress by Jim Stock and others. I demonstrate using tests for weak instruments and give a new graphical technique for presenting coefficient estimates that allows for hypothesis testing (using Anderson–Rubin-style test statistics) in the presence of weak instruments.

Additional information

How to do xtabond2

David Roodman
Center for Global Development

xtabond2 may hold the record among user-written Stata modules for the most confused users (and perhaps the most-confused too). In this presentation, I motivate and describe the Arellano–Bond and Blundell–Bond linear generalized method of moments (GMM) dynamic panel estimators, drawing lessons from a steady stream of correspondence with users. I also provide an overview of how to implement them with xtabond2. I first introduce linear GMM as an extension of ordinary least squares. Then I describe how limited time span, the potential for fixed effects, endogeneity, and the dangers of dynamic panel bias all shape these estimators—in particular, in their use of differences, lags as instruments, and GMM. I explain how xtabond2 commands should be constructed, with particular attention to the various options and suboptions for controlling instrument matrix construction. I discuss the need to limit the number of instruments and options for doing so.

Additional information

Time-series filtering techniques in Stata

Kit Baum
Boston College and RePEc

I will describe several time-series filtering techniques, including the Hodrick–Prescott, Baxter–King, and bandpass filters and variants, as well as present new Mata-coded versions of these routines that are considerably more efficient than previous ado-code routines. I will also discuss applications to several economic and financial time series.

Additional information

Towards self-contained data: Attaching validation routines to variables

William Rising
Bellarmine University

One of Stata’s great strengths is its data management abilities. When either building or sharing datasets, some of the most time-consuming activities are validating the data and writing documentation for the data. Much of this futility could be avoided if datasets were self-contained, i.e., if they could validate themselves. I will show how to achieve this goal within Stata.

I will demonstrate a package of commands for attaching validation rules to the variables themselves, via characteristics, along with commands for running error checks and marking suspicious observations in the dataset. The validation system is flexible enough that simple checks continue to work even if variable names change or if the data are reshaped, and it is rich enough that validation may depend on other variables in the dataset. Since the validation is at the variable level, the self-validation also works if variables are recombined with data from other datasets. With these tools, Stata’s datasets will become truly self-contained.

Managing edit checks and database cleaning with Stata

Jacqueline L. Buros
Perfuse Laboratories and Data Coordinating Center

We have developed a set of ado-files for use in data management, specifically designed to manage user-written edit checks and to complement the process of data cleaning. Collectively, these tools enable us to identify, distribute, and track edit checks in several large multicenter clinical trials by using Stata software.

Our approach is successful because the coding is simple and the entire process is visible and familiar to most users. It does not depend on any particular database structure. The framework approximates an object-oriented environment, with the objects being (a) the database, open at the time a command is called; (b) an edit check, consisting of a Stata do-file, a query message, and a list of variables to be identified for review; and (c) the edit-check history, implemented as a Stata dataset.

These objects can be manipulated directly or by using a command in Stata. Actions managed by command include creating or modifying an edit check, generating a query-clean dataset, preparing and tracking a set of edit-check documents, and summarizing the edit-check history. Here I present a brief overview of our process and describe using the commands in the context of clinical research.

A diff command for use with data files

Philip Schumm
University of Chicago

One of the most important tools in a programmer’s tool chest is the diff command. This command permits you to determine immediately whether two code files are identical and, when they are not, to generate a patch that summarizes the differences and can be used to transform the first file into the second. In this presentation, I will introduce an analogous tool written for use with data files. Unlike code files, in which each line is identified by its physical location within the file, records in a data file are typically identified by one or more indices, each composed of one or more distinct variables. Our tool compares two files on the basis of one or more such indices; provides a compact, readable summary of the differences; and can generate a patch (in the form of a do-file) to update the first file on the basis of the second. This tool is useful during data analysis whenever two or more versions of a data file are encountered and may also be used by a data coordinating center to manage repeated data submissions from multiple sites. The program was developed using Mata, and I will discuss some of the programming techniques.

Additional information

Tools for estimation of grouped conditional logit models

Paulo Guimarães
Medical University of South Carolina

In many applications of conditional logit models, the choice set and the characteristics of that set are identical for groups of decision makers. One can then obtain a more computationally efficient estimation of the model by grouping the data and using a new user-written command, multin. The command multin is designed for fitting grouped conditional logit models. It produces the same output as clogit but requires a more compact data layout, which is particularly relevant when the model comprises many observations and/or choices. In this situation, one can substantially reduce the size of the dataset and the time required for estimation. I also present a tool implemented in Mata that transforms the data as required by clogit to the new format required by multin. Finally, I discuss the problem of overdispersion in the grouped conditional logit model and present some alternatives to deal with this problem. One of these alternatives is Dirichlet-multinomial (DM) regression. I present a new command for fitting the DM regression model, dirmul. The dirmul command can also be used to estimate the better known beta-binomial regression models.

Additional information

A simulation-based sensitivity analysis for matching estimators

Tommaso Nannicini
Department of Economics, Universidad Carlos III de Madrid

I present a Stata program (sensatt) that implements the sensitivity analysis for propensity-score matching estimators proposed by Ichino, Mealli, and Nannicini (2005). The proposed analysis builds on Rosenbaum and Rubin (1983) and Rosenbaum (1987) and simulates a potential confounder to assess the robustness of the estimated treatment effect with respect to specific deviations from the conditional independence assumption. The program sensatt uses the user-written Stata commands for propensity-score matching estimation (att*) developed by Becker and Ichino (2002). An example of the implementation of the proposed sensitivity analysis is given by using the national supported work demonstration, widely known in the program evaluation literature.

Additional information

Using the Bayesian information criterion (BIC) to judge models and statistical significance

Paul Millar
University of Calgary

After a short review of the development of the Bayesian information criterion (Jeffery, Schwartz), I will discuss both its extension by Raftery for statistical significance (implemented as bic) and a further, simpler routine (bicdrop1) as preventive methods to avoid making incorrect inference decisions and as "model mining" procedures.

Additional information

Graphs for all seasons

Nicholas J. Cox
Durham University

Seasonal effects are dominant in many environmental time series and important or at least notable in many economic or biomedical time series, to name only a few application areas represented in the Stata user community. In several fields, using anything other than basic line graphs of responses versus time to display series showing seasonality seems rare. The presentation focuses on a variety of minor and major tricks for graphically examining seasonality, some of which have long histories in climatology or related sciences but appear little known outside. I will discuss some original code, but the greater emphasis is on users needing to know Stata functions and commands well if they are to exploit the full potential of its graphics.

Additional information

Confirmatory factor analysis in Stata

Stas Kolenikov
University of Missouri, Columbia

I will present a set of routines to conduct a one-factor confirmatory factor analysis in Stata. I will highlight using Mata in programming. I will demonstrate corrections for nonnormality, common in the structural equation modeling literature. I will also give indications for further development into multifactor models and, eventually, structural equation models.

Additional information

Tuesday, July 25, 2006

Matching methods for estimating treatment effects using Stata

Guido W. Imbens
Harvard University

I will give a brief overview of modern statistical methods for estimating treatment effects that have recently become popular in social and biomedical sciences. These methods are based on the potential outcome framework developed by Donald Rubin. The specific methods discussed include regression methods, matching, and methods involving the propensity score. I will discuss the assumptions underlying these methods and the methods for assessing their plausability. I will then discuss using the Stata command nnmatch to estimate average treatment effects. I will illustrate this approach by using data from a job training program.

A general survey of these methods can be found in the following:

Imbens, G. 2004.
Nonparametric estimation of average treatment effects under exogeneity: A review.
Review of Economics and Statistics 86: 4–30.
Additional information
Link to full text (MIT Press)

Econometric analysis of time-series data using Stata

David M. Drukker

After introducing time-series data management in Stata, the talk discusses estimation, inference, and interpretation of ARMA models, ARCH/GARCH models, VAR models, and SVAR models in Stata. The talk briefly introduces each model discussed.

Group comparisons and other issues in interpreting models for categorical outcomes using Stata

J. Scott Long
Indiana University

This presentation examines methods for interpreting regression models for categorical outcomes using predicted values. The talk begins with a simple example using basic commands in Stata. It builds on this example to show how more advanced programming features in Stata along with commands in Long and Freese's SPost package can be used in more complex applications that involve plotting predictions. These tools are then applied to the problem of comparing groups in models for categorical outcomes, focusing on the binary regression model. Identification issues make commonly used tests inappropriate since these tests confound the magnitude of the regression coefficients and the variance of the error. An alternative approach is proposed based on the comparisons of the predictions across groups. This approach is illustrated by extending the tools presented in the first part of the talk.

Estimation and interpretation of measures of inequality, poverty, and social welfare using Stata

Stephen P. Jenkins
University of Essex

This presentation reviews methods for summarizing and comparing income distributions, together with the related literature about variance estimation for a range of summary measures. Although the focus is on income and the perspective is that of an economist, the methods have been widely applied to other variables, including health-related ones, and by researchers from many disciplines. Topics covered include the measurement of inequality, poverty, and social welfare, and distributional comparisons based on the dominance methods as well as summary indices. Illustrations are provided using a suite of public-domain Stata programs written by the author and collaborators (e.g., glcurve, ineqdeco, povdeco, sumdist, svyatk, svygei, svylorenz), together with built-in commands.

Scientific organizers

Elizabeth Allred, Harvard School of Public Health
[email protected]

Kit Baum, Boston College and RePEc
[email protected]

Rich Goldstein, Consultant
[email protected]

Logistics organizers

Chris Farrar, StataCorp

Gretchen Farrar, StataCorp