Home  /  Resources & support  /  User Group meetings  /  2007 North American Stata Users Group meeting

Last updated: 23 August 2007

2007 North American Stata Users Group meeting

13–14 August 2007

Boston Gardens

Longwood Galleria Conference Center
342 Longwood Avenue
Boston, Massachusetts


Quantiles, L-moments and modes: Bringing order to descriptive statistics

Nicholas J. Cox
Durham University
Describing batches of data in terms of their order statistics or quantiles has long roots but remains underrated in graphically based exploration, data reduction, and data reporting. Hosking in 1990 proposed L-moments based on quantiles as a unifying framework for summarizing distribution properties, but despite several advantages they still appear to be little known outside their main application areas of hydrology and climatology. Similarly, the mode can be traced to the prehistory of statistics, but it is often neglected or disparaged despite its value as a simple descriptor and even as a robust estimator of location. This presentation reviews and exemplifies these approaches with detailed reference to Stata implementations. Several graphical displays are discussed, some novel. Specific attention is given to the use of Mata for programming core calculations directly and rapidly.

Additional information

Extensions to var and svar estimation

Michael Hanson
Yingzhe Zhao
Wesleyan University
We develop packages to support computation of historical decompositions in (S)VAR models in Stata, and to extend the estimation of impulse–response functions. Specifically, we compute cumulative structural impulse responses, which are useful for SVAR models that rely on long-run restrictions. While such models typically are estimated in differences, the responses of the levels of the endogenous variables to the identified structural innovations (that is, the cumulative structural impulse responses) are most often of theoretical interest. We also allow an option to relax the default assumption of symmetry when computing bootstrapped error bands for the impulse–response functions. We also develop a package to compute the historical decompositions of the variables in a SVAR, as a function of the estimated structural shocks. Used in conjunction with the previous package, one can compute historical decompositions for the levels of variables from a long-run SVAR model estimated in first differences. An application to the determination of the equilibrium Chinese real exchange rate will be shown.

Additional information

Meta-analytical integration of diagnostic accuracy studies in Stata

Ben Dwamena
University of Michigan Health System, Ann Arbor
This presentation will demonstrate how to perform diagnostic meta-analysis using midas, a user-written command. midas is comprehensive program of statistical and graphical routines for undertaking meta-analysis of diagnostic test performance in Stata. Primary data synthesis is performed within the bivariate generalized linear mixed modeling framework. Model specification, estimation, and prediction are carried out with gllamm (Rabe-Hesketh et.al, spherical adaptive quadrature). Using the estimated coefficients and variance–covariance matrices, midas calculates the summary operating sensitivity and specificity (with confidence and prediction ellipses) in SROC space. Summary likelihood and odds ratios with relevant heterogeneity statistics are provided. midas facilitates extensive statistical and graphical data synthesis and exploratory analyses of unobserved heterogeneity, covariate effects, publication bias, and subgroup analyses. Bayes’ nomograms, likelihood-ratio matrices, and conditional probability plots may be obtained and used to guide clinical decision making.

Agony and ecstasy: Teaching a computationally intensive introductory statistics course using Stata

Nicholas Jon Horton
Smith College
In the last decade, a sea change has occurred in the organization of introductory statistics courses. The mantra of “more data, less lecture” is widely repeated while active learning opportunities receive increasing focus. At Smith College, a small liberal arts college, several introductory statistics courses are offered, with various mathematical prerequisites. Stata is used as the computing environment for many of these courses. In all courses, students engage in the analysis of real-world example datasets, often taught in the form of mini-case studies (using a set of lab materials developed at UCLA). For the more mathematically savvy students, introductory statistics concepts are introduced through simulation and other activities. While Stata serves as an easy-to-use environment for statistical analysis, there are areas where additional functionality would improve its use as a testbed for statistical investigation. In this presentation, I will review the use of Stata for both of these purposes and detail areas of strengths and potential improvements.

Additional information

Powerful new tools for time series analysis

Christopher Baum
Boston College, DIW Berlin, and RePEc
Elliott and Jansson developed a powerful test for unit roots, published in Journal of Econometrics (2003), extending the Elliott–Rothenberg–Stock test (dfgls) by adding stationary covariates. I will discuss and demonstrate a Stata implementation of the test. Elliott and Müller's Review of Economic Studies paper (2006) illustrates how tests for parameter constancy and tests for an unknown break process can be unified to produce a single efficient test for stability of the regression function. I will discuss and demonstrate a Stata implementation of the test.

Additional information

Record linkage in Stata

Michael Blasnik
M. Blasnik & Associates
Record linkage involves attempting to match records from two different data files that do not share a unique and reliable key field. It can be a tedious and challenging task when working with multiple administrative databases where one wants to match subjects by using names, addresses, and other identifiers that may have spelling and formatting variations. Formal record linkage methods often use a combination of approximate string comparators and probabilistic matching algorithms to identify the best matches and assess their reliability. Some standalone software is available for this task. This presentation will introduce reclink, a rudimentary probabilistic record matching program for Stata. reclink uses a modified bigram string comparator and allows user-specified match and nonmatch weights. The algorithm also provides for blocking (both “or” and “and”) to help improve speed for this otherwise slow procedure.

Ado-lists: A new concept for Stata

Ben Jann
ETH Zürich
A new command called adolist is presented. adolist is a tool to create, install, and uninstall lists of user ado-packages (“adolists”). For example, adolist can create a list of all user packages installed on a system and then install the same packages on another system. Moreover, ado-list can be used to put together thematic lists of packages such as, say, a list on income inequality analysis or time-series add-ons, or the list of “41 user ados everyone should know”. Such lists can then be shared with others, who can easily install and uninstall the listed packages using the adolist command.

Additional information

Constructing Krinsky and Robb confidence interval for mean and median WTP using Stata

P. Wilner Jeanty
Ohio State University
The ultimate goal of most nonmarket valuation studies is to obtain welfare measures, i.e., mean and/or median willingness to pay (WTP) and confidence intervals. While the delta (nlcom) and bootstrap (bs) methods can be used for constructing such confidence intervals in Stata, they are not recommended because WTP measures are nonlinear functions of random parameters (Creel and Loomis 1991). The best and most widely used approach, which is not available in Stata, consists of simulating the confidence intervals by using the Krinsky and Robb procedure (Haab and McConnell 2002). Hole (2007) has recently introduced a useful command, wtp, that implements the Krinsky and Robb procedure in Stata but does not feature mean and median WTP estimates and their confidence intervals. I present a Stata command, wtpcikr, that computes mean and median WTP, confidence intervals using the Krinsky and Robb procedure, achieved significance level (ASL) for testing the null hypothesis that WTP equals zero, and a relative efficiency measure (Loomis and Ekstrand 1998). The command supports both linear and exponential contingent valuation models estimated with or without covariates using the Stata commands probit, logit, biprobit, and xtprobit. I will illustrate the use of wtpcikr by replicating empirical results in Haab and McConnell (2002).

Additional information

Resampling inference through quasi–Monte Carlo

Stanislav Kolenikov
University of Missouri, Columbia
This presentation will review quasi–Monte Carlo methods (Halton sequences) and their applications in resampling inference. The two major applications are the bootstrap procedures where quasi–Monte Carlo methods allow one to achieve stability close to that of the balanced bootstrap and the complex survey variance estimation where quasi–Monte Carlo methods allow one to create approximately balanced resampling designs, thus providing a compromise between the balanced resampling designs and regular bootstrap.

Causal inference with observational data: Regression discontinuity and related methods in Stata

Austin Nichols
Urban Institute
This overview of implementing quasiexperimental methods of estimating causal impacts (panel methods, matching estimators, instrumental variables, and regression discontinuity) emphasizes practical considerations and Stata-specific approaches, with examples using real data and comparisons across methods. Particular attention is paid to the regression discontinuity method, which seems to be less well-known in the larger community of Stata users but is the most well regarded of the quasiexperimental methods in those circumstances where it is appropriate.

Additional information

Recent developments in multilevel modeling, including models for binary and count responses

Roberto G. Gutierrez
Mixed-effects models contain both fixed and random effects. The fixed effects are analogous to standard regression coefficients and are estimated directly. The random effects are not directly estimated but instead are summarized according to their estimated variances and covariances, known as variance components. Random effects take the form of either random intercepts or random coefficients, and the grouping structure of the data may consist of multiple levels of nested groups. In Stata, one can fit mixed models with continuous (Gaussian) responses by using xtmixed and in Stata 10, fit mixed models with binary and count responses by using xtmelogit and xtmepoisson, respectively. All three commands have a common multiequation syntax and output, and postestimation tasks such as the prediction of random effects and likelihood-ratio comparisons of nested models also take a common form. This presentation will cover many models that one can fit using these three commands. Among these are simple random intercept models, random-coefficient models, growth curve models, and crossed-effects models.

Additional information

From estimation output to document tables: A long way made short

Ben Jann
ETH Zürich
Postestimation processing and formatting of statistical results for input into document tables are tasks that most of us have to do. However, processing results by hand can be tedious and is prone to error. There are therefore many benefits to automating these tasks while at the same time retaining user flexibility in terms of output format and accessibility. This talk is concerned with such automation processes, focusing primarily on tabulating results from estimation commands. In the first part of the talk, I briefly review existing approaches and user-written programs and then provide an extensive tutorial on the estout package. Compiling estimation tables for display on screen and for inclusion into, e.g., LaTeX, Word, or Excel documents, is illustrated using a range of examples, from relatively basic applications to complex ones. In the second part of the talk, I draw on material from J. Scott Long’s presentation last year and introduce some new utilities to tabulate results from Long and Freese’s SPost commands for categorical outcomes models.

Additional information

Power analysis and sample-size determination in survival models with the new stpower command

Yulia Marchenko
Power analysis and sample-size determination are important components of a study design. In survival analysis, the power is directly related to the number of events observed in the study. The required sample size is therefore determined by the observed number of events. Survival data are commonly analyzed using the log-rank test or the Cox proportional hazards model. Stata 10’s new stpower command provides sample-size and power calculations for survival studies that use the log-rank test, the Cox proportional hazards model, and the parametric test comparing exponential hazard rates. It reports the number of events that must be observed in the study and accommodates unequal subject allocation between groups, nonuniform subject entry, and exponential losses to follow-up. This talk will demonstrate power, sample-size, and effect-size computations for different methods used to analyze survival data and for designs with recruitment periods and random censoring (administrative and loss to follow-up). It will also discuss building customized tables and producing graphs of power curves.

Additional information

Scientific organizers

Kit Baum, Boston College
[email protected]

Marcello Pagano, Harvard School of Public Health
[email protected]

Logistics organizers

Chris Farrar, StataCorp

Gretchen Farrar, StataCorp