## Stata Conference Boston 2010: Abstracts

### Regression for nonnegative skewed dependent variables

Austin Nichols
Urban Institute
In this presentation, I compare several options for estimation and prediction in regressions using nonnegative skewed dependent variables. Often, Poisson regression outperforms competitors, even when its assumptions are violated and the correct model is one that justifies a competitor.

boston10_nichols.pdf

### Margins and the Tao of interaction

Phil Ender
UCLA Statistical Consulting Group
In this presentation, I show how to use the new margins command, introduced in Stata 11, to explore interactions in regression and analysis of variance. I cover three types of interactions: 1) categorical by categorical, 2) categorical by continuous, and 3) continuous by continuous. I also cover issues concerning graphing of interactions, along with hypothesis testing that is appropriate for interactions.

boston10_ender.pdf

### To the vector belong the spoils: Circular statistics in Stata

Nicholas J. Cox
Durham University
Circular statistics are needed when one or more variables have outcome space in a circle, which is, for example, true for data measured with reference to a compass, a clock, or a calendar. Applications abound in the earth and environmental sciences and other disciplines, such as music, not to mention the economic and medical fields that are well represented among Stata users. A talk on circular statistics was given in Boston in 2001. In this update, I survey the field with special reference to recently revised or newly written programs for graphics, modeling, testing, and summary.

boston10_cox.zip

### System for formatting tables

John Gallup
Portland State University
The addition to Stata of a system for formatting tables enables extensive formatting of statistical tables created within Stata, ultimately allowing users to create native Word or TeX tables. In this presentation, I talk about this system, which is intended for use by programmers. Users may specify font sizes, font types, text justification, cell height and width, cell boundary lines of different styles, titles, labels, and footnotes, among other attributes. New data may be merged or appended to existing tables to create more-complex tables. This system can provide full formatting for statistical tables, similar to the way that Stata provided granular formatting for graphics starting in Stata 8. This system is implemented in Mata for speed and compact memory use (Mata string matrices made for efficient coding). I have reimplemented the outreg ado program using this system, and I have written a program to create formatted cross-tabulation tables like those created by tabulate. I also plan to write a program to create formatted summary statistics tables.

boston10_gallup.pdf

### Hunting for genes with longitudinal phenotype data using Stata

Chuck Huber
Texas A&M Health Science Center School of Rural Public Health
Project Heartbeat! was a longitudinal study of metabolic and morphological changes in adolescents aged 8–18 years. It was conducted in the 1990s. A study is currently being conducted to consider the relationship between a collection of phenotypes (including BMI, blood pressure, and blood lipids) and a panel of 1,500 candidate single nucleotide polymorphisms (SNPs). Traditional genetics software, such as PLINK and HelixTree, lacks the ability to model longitudinal phenotype data. In this talk, I describe how to use Stata for a longitudinal genetic association study that includes these tasks: early-stage data checking (allele frequencies and Hardy–Weinberg equilibrium), modeling of individual SNPs, use of false discovery rates to control for the large number of comparisons, exporting and importing the data through PHASE for haplotype reconstruction, selection of tag SNPs in Stata, and analysis of haplotypes. I also discuss strategies for scaling up to an Illumina 100k SNP chip using Stata. All SNP names and gene names will be de-identified because this is a work in progress.

boston10_huber.ppt

### Bayesian bivariate diagnostic meta-analysis via R-INLA

University of Michigan and VA Ann Arbor Healthcare Systems
Bivariate generalized mixed modeling is currently recommended for joint meta-analysis of diagnostic test sensitivity and specificity. Estimation is commonly performed using frequentist likelihood-based techniques assuming bivariate, normally distribute d, correlated logit transformations of sensitivity and specificity. These estimation techniques are fraught with nonconvergence and invalid confidence intervals and correlation parameters, especially with sparse data. Bayesian approaches, though likely to surmount these and other problems, have not previously been applied. Recently, integrated nested Laplacian approximation (INLA) has been developed as a computationally fast, deterministic alternative to Markov chain Monte Carlo (MCMC)-based Bayesian m odeling, and an R interface to the C-based INLA program has been applied to diagnostic meta-analysis. In this presentation, I show how to easily interface R-INLA estimation with data preprocessing and postprocessing within Stata. A user-written ado-file allows user-friendly application of INLA by Stata users.

### Storing, analyzing, and presenting Stata output

Julian Reif
University of Chicago
In this presentation, I discuss how to store, analyze, and present Stata output. I explain how to use my commands regsave and svret to save Stata output to a Stata-formatted dataset. Results can then easily be manipulated using standard Sta ta commands. I next demonstrate how to export large sets of results to Microsoft Excel, where they can easily be viewed in a pivot table. Finally, I show how to use my command texsave to export results to a LaTeX table that can be incorporated int o a professional paper or presentation. I provide examples that show how to automate these procedures so that researchers can easily rerun analyses without having to manually reassemble their output each time.

boston10_reif.pdf
boston10_reif.zip

### An efficient data envelopment analysis with a large dataset in Stata

Choonjoo Lee
Korea National Defense University
In this presentation, I present an approach to improving the computational efficiency of data envelopment analysis (DEA) with a large dataset in Stata. I presented my dea program at the Stata Conference DC 09. I have reviewed various comments and requests by Stata users and have updated the code significantly in terms of computation time and model variants. In this presentation, I illustrate an approach to reducing the computation time and to improving the accuracy of DEA results using a five-inputs one-output dataset with 365 decision-making units (DMUs).

boston10_lee.ppt
boston10_lee.zip

### Competing-risks regression in Stata 11

Roberto G. Gutierrez
StataCorp
Competing-risks survival regression provides a useful alternative to Cox regression in the presence of one or more competing risks. For example, say that you are studying the time from initial treatment for cancer to recurrence of cancer in relation to the type of treatment administered and demographic factors. Death is a competing event: The person under treatment may die, impeding the occurence of the event of interest, recurrence of cancer. Unlike censoring, which merely obstructs you from viewing the event, a competing event prevents the event of interest from occurring altogether. Depending on the scope of your statistical inference, your analysis may need to be adjusted for competing risks.

Stata’s new stcrreg command implements competing-risks regression based on Fine and Gray’s proportional subhazards model. In this talk, I focus on that new command and compare the method of Fine and Gray to a method based on directly modeling cause-specific hazards. Regardless of method, the focus is on estimating the cumulative incidence function (CIF) for the event of interest in the presence of competing events.

boston10_gutierrez.pdf

### Structural equation models with latent variables

Stas Kolenikov
University of Missouri
In this talk, I introduce the main ideas of structural equation models (SEMs) with latent variables and Stata tools that can be used for such models. The two approaches most often used in applied work are numeric integration of the latent variable s and covariance structure modeling. The first approach is implemented in Stata via gllamm, which was developed by Sophia Rabe-Hesketh). The second approach is currently implemented in confa for confirmatory factor analysis models. Also, introduction of the generalized method of moments (GMM) estimation and testing framework in Stata 11 made it possible to estimate SEMs by using moderately complex parameter and matrix manipulations. I provide working examples with some popular datasets (Holzinger–Swineford factor analysis model and Bollen’s industrialization and political democracy model).

boston10_kolenikov.pdf
boston10_kolenikov.zip

### Multiple imputation using Stata’s mi command

Yulia Marchenko
StataCorp
Stata’s mi command can be used to perform multiple-imputation analysis, including imputation, data management, and estimation. mi impute provides a number of univariate and multivariate imputation methods, including multivariate normal (MVN) data augmentation. mi estimate combines the estimation and pooling steps of the multiple-imputation procedure into one easy step. mi also provides an extensive ability to manage multiply imputed data. I give a brief overview of all of mi’s capabilities, with emphasis on mi impute and mi estimate, and I also demonstrate examples of some of mi’s unique data-management features.

boston10_marchenko.pdf

### CEM: Coarsened exact matching in Stata

Matthew Blackwell
Harvard University
I introduce a Stata implementation of coarsened exact matching, a new method for improving the estimation of causal effects by reducing imbalance in covariates between treated and control groups. Coarsened exact matching is faster, is easier to use and understand, requires fewer assumptions, is more easily automated, and possesses more attractive statistical properties for many applications than do existing matching methods. In coarsened exact matching, users temporarily coarsen their data, exact match on these coarsened data, and then run their analysis on the uncoarsened, matched data. Coarsened exact matching bounds the degree of model dependence and causal effect estimation error by ex ante user choice, is monotonic imbalance bounding (so that reducing the maximum imbalance on one variable has no effect on others), does not require a separate procedure to restrict data to common support, meets the congruence principle, is approximately invariant to measurement error, balances all nonlinearities and interactions in sample (that is, not merely in expectation), and works with multiply imputed datasets. Other matching methods inherit many of the coarsened exact matching method’s properties when applied to further match data that are preprocessed by coarsened exact matching.

boston10_blackwell.pdf

### Evaluating one-way and two-way cluster–robust covariance matrix estimates

Christopher F. Baum
Boston College
In this presentation, I update Nichols and Schaffer’s 2007 UK Stata Users Group talk on clustered standard errors. Although cluster–robust standard errors are now recognized as essential in a panel-data context, official Stata only supports clusters that are nested within panels. This requirement rules out the possibility of defining clusters in the time dimension and modeling contemporaneous dependence of panel units’ error processes. I build upon recent analytical developments that define two-way (and conceptually, n-way) clustering and upon the 2010 implementation of two-way clustering in the widely used ivreg2 and xtivreg2 packages. I present examples of the utility of one-way and two-way clustering using Monte Carlo techniques, I present a comparison with alternative approaches to modeling error dependence, and I consider tests for clustering of errors.

boston10_baum.pdf

### Bootstrap LM test for the Box–Cox tobit model

David Vincent
Hewlett-Packard
Consistency of the maximum likelihood estimators for the parameters in the standard tobit model relies heavily on the assumption of a normally distributed error term. The Box–Cox transformation presents an obvious attempt to preserve normality when the data make it questionable. In this presentation, I set out an outer-product-of-gradients (OPG) version of a Lagrange multiplier (LM) test for the null hypotheses of the standard tobit model against the alternative of a more general nonlinear specification, as determined by the parameter of the Box–Cox transformation. Monte Carlo estimates of the rejection probabilities using first-order asymptotic and parametric bootstrap critical values are obtained for sample sizes that are comparable to those used in practice. The results show that the LM test using bootstrap critical values has practically no size distortion, whereas when using asymptotic critical values, the empirical rejection probabilities are significantly larger than the nominal levels. A simple program that carries out this test using bootstrap critical values has also been written and can be run after the official Stata tobit estimation command.

boston10_vincent.pdf

### Teaching a statistical program in emergency medicine research rotations: Command-driven or click-driven?

Lincoln Medical and Mental Health Center
Stata is a command-driven program. It is a general-purpose statistical software package that is used by people of different backgrounds and professional disciplines. Most Stata users, however, are nonphysicians. Because Stata is used by people in all f ields, most training programs offered are geared toward programmers and nonphysicians. Although Stata has simple commands, they may be difficult for nonprogrammers to use. Generally, physicians are familiar with clicking on rather than writing commands. To teach emergency medicine (EM) residents, I developed a teaching approach using pull-down menus. I observed that for EM residents, it was easy to learn and use pull-down menus. While teaching, I emphasized how to enter and import data. During the EM research rotation, residents were introduced to the Stata software in addition to research methods. I also developed a manual explaining the basic operations of Stata. Providing an introduction of Stata prior to data entry improved the accuracy of data recording and facilitated data analysis. It also provided EM residents with the experience to navigate Stata following the completion of the research rotation. Although the basic functions of Stata can be learned via this method, I feel that it is necessary to develop a training program that addresses the needs of physicians.