Last updated: 7 August 2015

2015 Stata Conference Columbus

30–31 July 2015

Mardi Gras

Hyatt Regency Columbus
350 North High Street
Columbus, Ohio
(614) 463-1234

Proceedings


midasinla: midas goes Bayesian via R-INLA

Ben Adarkwa Dwamena
University of Michigan Medical School
Integrated nested Laplace approximation (INLA) has been developed as a computationally fast, deterministic alternative to Markov chain Monte Carlo (MCMC)-based Bayesian modeling. An R interface to the C-based INLA (R-INLA) program is available with extensive and diverse applications, including diagnostic test accuracy meta-analysis. In this presentation, I discuss the INLA methodology briefly and, in more detail, an illustrated application of the user-written ado-file midasinla, a deterministic Bayesian version of midas (a comprehensive and medically popular module for diagnostic test accuracy meta-analysis). This Stata routine provides R-INLA estimation of the bivariate random-effects model for diagnostic accuracy meta-analysis with data pre- and post-processing within Stata. A dataset of studies evaluating auxillary staging performance of positron emission tomography in breast cancer patients is provided for illustration of the omnibus capabilities of midasinla.

Additional information
columbus15_dwamena.pdf

Estimating treatment effects for ordered outcomes using maximum simulated likelihood

Christian Gregory
Economic Research Service, USDA
In this presentation, I introduce four new modules: treatoprobit, switchoprobit, treatoprobitsim, and switchoprobitsim. Each of these routines estimates a model in which a binary endogenous variable affects an ordered outcome. treatoprobit and switchoprobit estimate treatment and outcome under the assumption that the error terms in the selection and outcome process are distributed as bivariate normal. treatoprobitsim and switchoprobitsim allow researchers to relax this assumption by estimating models in which a latent factor with a potentially nonnormal distribution accounts for the correlation between treatment and outcome. treatoprobit and treatoprobitsim operate under the assumption of a single outcome regime for treated and untreated groups; switchoprobit and switchoprobitsim work under (and test) the assumption that outcome processes for treated and untreated ought to be handled as distinct. The presentation will introduce the modules, show Monte Carlo evidence regarding their performance, and offer an example of their use. This presentation is based on an article that is currently under review at the Stata Journal.

Additional information
columbus15_gregory.pdf

Linear dynamic panel-data estimation using maximum likelihood and structural equation modeling

Richard Williams
Department of Sociology, University of Notre Dame
Paul Allison
Department of Sociology, University of Pennsylvania
Enrique Moral Benito
Banco de Espana Madrid
Panel data make it possible both to control for unobserved confounders and to include lagged, endogenous regressors. Trying to do both at the same time, however, leads to serious estimation difficulties. In the econometric literature, these problems have been solved by using lagged instrumental variables together with the generalized method of moments (GMM). In Stata, commands such as xtabond and xtdpdsys have been used for these models. Here we show that the same problems can be addressed via maximum likelihood estimation implemented with Stata's structural equation modeling (sem) command. We show that the ML (sem) method is substantially more efficient than the GMM method when the normality assumption is met and suffers less from finite sample biases. We introduce a command named xtdpdml with syntax similar to other Stata commands for linear dynamic panel-data estimation. xtdpdml simplifies the SEM model-specification process, makes it possible to test and relax many of the constraints that are typically embodied in dynamic panel models, and takes advantage of Stata's ability to use full information maximum likelihood (FIML) for dealing with missing data.

Additional information
columbus15_rwilliams.pdf

15 years a consultant

Phil Ender
UCLA Statistical Consulting Group (Ret)
I present the origins and evolution of the UCLA Statistical Consulting Group. The presentation will cover the history of the UCLA Statistical Consulting Group as well as one approach to the practice of statistical consulting in an academic environment. UCLA Statistical Consulting provides services to faculty, graduate students, and campus researchers. Additionally, the group maintains a website popular not only with Stata users but also with users of other statistical packages.

Additional information
columbus15_ender.pdf

Robust inference in regression-discontinuity designs

Matias Cattaneo
University of Michigan
Sebastian Calonico
University of Miami
Rocio Titiunik
University of Michigan
In this presentation, I will review main methodological results from the regression-discontinuity (RD) design literature and illustrate them using the Stata rdrobust package provided by the authors. More information about the Stata package and background methodological and theoretical papers may be obtained here: https://sites.google.com/site/rdpackages/rdrobust. If time permits, I will also discuss two ongoing research projects on RD methods and their corresponding Stata implementations. The first project focuses on RD inference under a local randomization assumption, while the second project discusses a new manipulation test for RD designs.

Additional information
columbus15_cattaneo.pdf

Estimation in panel data with individual effects and AR(p) remainder disturbances

Long Liu
Department of Economics, The University of Texas at San Antonio
In this presentation, I introduce a new user-written Stata command, xtregarp. This command considers the problem of estimation in a panel-data model with both individual effects and AR(p) remainder disturbances. It utilizes a simple exact transformation for the AR(p) time-series process derived by Baltagi and Li (1994) and obtains the generalized least-squares estimator for this panel model as a least-squares regression. This command allows the individual effects to be either random effects or fixed effects. The performance of this estimator is illustrated using an empirical example.

Additional information
columbus15_liu.pdf

Item response theory models in Stata

Rebecca Pope
Health Econometrician, StataCorp
Stata 14 provides several new commands for fitting item response theory (IRT) models. IRT has a long history in test development and psychometrics and is now being adopted more broadly in fields such as health services research. In this presentation, I will provide an overview of IRT, demonstrate fitting models with binary and categorical items, and discuss postestimation tools such as plotting characteristic curves and information functions.

Additional information
columbus15_pope.pdf

Meta-analysis on the effects of interviewer supportiveness on the accuracy of children's reports

Christine Wells
Statistical Consulting Group, UCLA
Karen Saywitz, PhD
UCLA
Rakel Larson, MA
University of California, Riverside
Sue Hobbs, PhD
University of California, Davis
Increasingly, children are called upon to participate in decisions that affect their welfare, from providing testimony in court to providing input to public policies. However, many questions remain regarding how to elicit accurate, reliable information from children. A meta-analysis was conducted to investigate the effect of a supportive interviewer on the accuracy of information provided by children (ages 4 to 12). The interviewers asked both neutral and misleading questions in both supportive and nonsupportive conditions. Our results suggest that interviewer supportiveness, when provided in a nonsuggestive manner, bolsters the reliability of children's reports, and that supportiveness lowers children's errors on misleading questions. Despite the importance of this topic, only eight randomized control studies were identified to be included in the meta-analysis. These studies hail from the psychology literature and were published over 18 years. These two facts introduced some interesting challenges in preparing the data for the meta-analysis. The analyses included the meta-analysis, investigation into possible nonindependence, a search for outliers, and cumulative meta-analyses. The current guidelines for publishing a meta-analysis in the psychological literature, specifically the MARS guidelines, will be discussed as well as the user-written commands and their options used to perform these analyses.

Additional information
columbus15_wells.pdf

tetrad: A program for confirmatory tetrad analysis

Shawn Bauldry
University of Alabama at Birmingham
Kenneth Bollen
University of North Carolina at Chapel Hill
Confirmatory tetrad analysis (CTA) is a method of testing and comparing the fit of structural equation models (SEMs) based on tetrads (differences in the product of pairs of covariance of observed variables). CTA has a few benefits over alternative methods of testing SEM model fit, including (1) some underidentified SEMs are still testable using their vanishing tetrads, (2) some SEMs are nested in their vanishing tetrads and can be directly compared while they are not nested using alternative estimators, and (3) researchers can perform tests on parts of the model as well as the whole model. We have developed a Stata command that conducts CTA based on the approach outlined in Bollen (1990) and Bollen and Ting (1993). The approach involves 4 steps: (1) identify vanishing tetrads (tetrads that equal 0) for a given model, (2) compute the asymptotic covariance matrix for the vanishing tetrads, (3) identify nonredundant vanishing tetrads, and (4) compute the tetrad test statistic. The Stata command takes as input the set of observed variables and an implied covariance matrix from a hypothesized model (or two implied covariance matrices if a nested test) that can be obtained following the sem command and then returns the tetrad test statistic.

Additional information
columbus15_bauldry.pdf

Postestimation parameter recentering and rescaling

Douglas Hemken
Social Science Computing Cooperative, University of Wisconsin–Madison
Recoding data prior to model estimation is a frequent part of analysis. For linear models, this can be thought of as a change of basis that is common to the data and the model. Where the change of basis in the data is linear, the change in the model is also linear. We can calculate the transformed parameters (and the transformed parameter variance–covariance matrix) without actually recoding our data. The same mathematics that is used to design factorial experiments or design contrasts that include interactions can be extended to include recentering and rescaling continuous variables in models with interaction terms. This gives us a general solution to such problems as calculating standardized coefficients, or converting models expressed in American units of measure to international units, regardless of whether the models include interaction terms or whether we have access to the original data. This is implemented here as a Stata program, stdParm, that produces centered or standardized parameters and precision matrices, postestimation.

Additional information
columbus15_hemken.pdf

Statistical process control charts

Barbara Williams
Virginia Mason Medical Center
Statistical process control (SPC) charts are used to assess outcomes measured over time, usually with the purpose of detecting improvement or maintaining a high level of performance. Traditionally used in industrial engineering for quality control, these methods are now frequently employed in healthcare and are the standard method of analysis for quality improvement work. In this presentation, I define methods to improve on current Stata syntax to generate useful and reader-friendly SPC charts. I build on existing Stata cchart (count), pchart (proportion), rchart (range), and xchart (average) commands to produce SPC charts with a clear, easy-to-read visual display. This presentation will explore default and edited pchart and xchart examples using health services research data, including the syntax for creating these graphs. Graphic elements include customized axis labels, text, colors, lines, notes, fonts, and titles. Under this approach, Stata can replace current SPC chart generators, including macros for Excel and stand-alone programs.

Additional information
columbus15_bwilliams.pptx

Data workflows with Stata and Python

Stephen Childs
Education Policy Research Initiative, University of Ottawa
Dejan Pavlic
Education Policy Research Initiative, University of Ottawa
Python is a general purpose programming language with a large library of packages that extend into domains that Stata does not touch. In this presentation, I will identify the key packages from Python that will allow it to work with Stata, primarily the pandas framework. Pandas is a relatively new, but extremely powerful, package for data preparation and analysis that works well with Stata–including support for categorical variables. I will discuss some new tools that have been developed to make it easier to connect Stata to Python. I will also discuss using Stata with the IPython Notebook, a tool that allows researchers to combine code and text in an easy-to-access document. During their work with the Education Policy Research Initiative, the authors have successfully transitioned much complex data preparation from Stata to Python while still supporting Stata's powerful analytical tools. This presentation is ideal for those interested in incorporating some Python into their workflow or planning a larger transition.

Additional information
columbus15_childs.pdf

Distribution-free estimation of heteroskedastic binary response models in Stata

Jason Blevins
Department of Economics, The Ohio State University
Shakeeb Khan
Duke University
In this presentation, I demonstrate how to implement two recent semiparametric estimators for binary response models in Stata. These estimators do not require parametric assumptions on the distribution of the error term, unlike the logit and probit models, and they allow for general forms of heteroskedasticity. I begin with a short introduction to binary response models and the various known identifying assumptions, including the weak conditional median independence assumption that the two estimators of interest are based on. Then, I focus on two recently proposed semiparametric estimators: a sieve nonlinear least-squares estimator and a local nonlinear least-squares estimator. I demonstrate how both estimators can be easily implemented in Stata via simple modifications to the standard probit objective function, and I give several applied examples and Monte Carlo results. Finally, I introduce the dfbr package by Blevins and Khan (2013, Stata Journal, st0310) for distribution-free estimation of binary response models. Although the estimators can be implemented by hand using standard Stata commands, this package provides a standard Stata interface for the user, automates constructing the modified probit objective functions, and calculates bootstrap standard errors.

Additional information
columbus15_blevins.pdf

A comparison of modeling scales in flexible parametric models

Noori Akhtar-Danesh
McMaster University
Cox regression and parametric survival models are quite common in the analysis of survival data. Recently, flexible parametric models (FPM) have been introduced that are extensions of the parametric models such as the Weibull (hazard-scale) model, the loglogistic (odds-scale) model, and the lognormal (probit-scale) model. In this presentation, I aim to statistically compare these modeling scales. I used Stata code stpm2 to compare flexible parametric models based on these three different scales. I used two subsets of the U.S. National Cancer Institute's Surveillance, Epidemiology, and End Results (SEER) dataset for this illustration: one on ovarian cancer diagnosed between 1991 and 2010 and one on colorectal cancer diagnosed in men between 2001 and 2010. The ovarian and colorectal datasets included data from 13,810 and 42,002 patients, respectively. Patients were classified into different age groups. I present results using graphs to compare survival curves, trends in one-year and five-year survival rates, and mortality rates. In general, there were no substantial differences between the three modeling scales, although the probit-scale showed better fit based on the Akaike information criterion (AIC) for both datasets.

Additional information
columbus15_akhtar_danesh.pdf

Estimating Markov-switching regression models in Stata

Ashish Rajbhandari
Senior Econometrician, StataCorp
Many datasets are not well characterized by linear autoregressive moving-average (ARMA) models. In this presentation, I will describe the new mswitch command, which implements Markov-switching regression models, which characterize many of these datasets well. Markov-switching regression models allow the time series to switch between unobserved states according to a Markov process. mswitch can estimate the parameters of the Markov-switching dynamic regression (MSDR) model and Markov-switching autoregressive (MSAR) model. This talk outlines the models, discusses the relative advantages of MSDR and MSAR models, and discusses examples of how to intepret mswitch output and its postestimation features.

Additional information
columbus15_rajbhandari.pdf

Between and beyond: Irregular series, interpolation, variograms, and smoothing

Nicholas Cox
Department of Geography, Durham University
Time series (and similar one-dimensional series) are more often irregularly spaced than many methods texts or courses admit. Even with a plan of regular measurements, gaps can arise for many human or inhuman reasons, while some series are naturally irregular. Interpolation of values between known values is a centuries-old need but one neglected by official Stata, which offers only linear interpolation and cubic spline interpolation (in Mata). I review additional user-written commands for interpolation, including those for cubic, nearest neighbor, and piecewise cubic Hermite methods available from SSC. Beyond interpolation of irregular series lie the questions of characterizing the structure of such series and smoothing in various ways. One useful tool standard in spatial statistics is the variogram, which relates dissimilarity as squared differences between values to their separation in time or distance in space. Diggle and others have shown uses for variograms in time-series and longitudinal data analysis. I discuss user-written Stata commands for variogram calculation, plotting and use in relation to exploratory data analysis on the one hand and smoothing on the other.

Additional information
columbus15_cox.ppt

Public program sensitivity: Using ROC curves to characterize classification efficiency of state Medicaid systems

Lisa Frazier
John Glenn College of Public Affairs, The Ohio State University
Despite being the largest single source of health care coverage in the U.S., Medicaid fails to capture all eligible citizens. This is a well-known problem among means-tested programs like Medicaid; discussions of take-up and churning attend to this failure. Cases of fraud in programmatic enrollments represent another classification failure in these systems. Reports on rates of fraud, take-up, and churn rarely acknowledge that such outcomes are ultimately features of the same tradeoff function: the sorting of citizens into benefit groups on the basis of membership to some a priori category. This research elucidates the implicit tradeoffs being made in the Medicaid citizen-sorting mechanism by using administrative data to construct ROC curves for each state Medicaid system before and after the passage of the Affordable Care Act.

Additional information
columbus15_frazier.pptx

Small-sample inference for linear mixed-effects models

Xiao Yang
Senior Statistician and Software Developer, StataCorp
Researchers are often interested in making inferences about fixed effects in a linear mixed-effects model. For a large sample, the null sampling distributions of the test statistics can be approximated by a normal distribution for a one-hypothesis test and a chi-squared distribution for a multiple-hypotheses test. For a small sample, these large-sample approximations may not be appropriate, and t and F distributions may provide better approximations. In this presentation, I will describe five denominator-degrees-of-freedom (DDF) methods available with mixed in Stata 14, including the Satterthwaite and Kenward–Roger methods, and I will demonstrate examples of when and how to use these methods.

Additional information
columbus15_yang.pdf

Development of a project-based statistics course for applied biostatistics using Stata

Frank Snyder
Purdue University
Project-based learning is an instructional approach that is designed to build students' skills and offer real-world activities, such as defining a research question and using nationally representative data to find an answer (Dierker et al. 2012). The purpose of this presentation is to describe an innovative, project-based statistics course for applied biostatistics using Stata. The semester-long course is designed as a graduate-level introductory biostatistics course; however, it could easily be adapted for use in an undergraduate public health program. The course combines two textbooks (Acock 2014; Bush 2012) and traditional lecture and assessment with computer lab activities and a research project. The project-based course structure offers students the opportunity to directly apply course content to their unique research question, with the intent to increase students' motivation and interest in statistics. Each student's culminating experience is a 15-minute presentation or poster that explains his or her research and results to classmates or an alternative audience. Course evaluation data demonstrate that students rate the course as excellent, and students strongly agree the course encourages learning. A course syllabus, lab activities, Stata do-files, and a description of the research project and final presentation will be available upon request.

Additional information
columbus15_snyder.pptx

Brewing color schemes in Stata: Making it easier for end users to customize Stata graphs

William Buchanan
Mississippi Department of Education
Although Stata graphs can be created to satisfy customized needs, it can be time consuming to specify all the unique options required to create clean customized graphs. Graph schemes provide a method to help alleviate this difficulty, but customizations to graph schemes are typically fixed for a single scheme. In this presentation, I will be discussing a new Stata program, brewscheme, that allows end users to generate customized graph schemes using color palettes available from www.colorbrewer2.org. The program allows users to specify a single color palette for all graph types, unique color palettes for individual graph types, or a combination (for example, to specify color palettes and the number of colors to select from the palette) for scatterplots and to set a default color palette for the other graph types. Additionally, the schemes generated by the program also set clean graph defaults (for example, all white backgrounds and foregrounds, no grid lines, etc.), orient axis labels horizontally, and remove boxes around legends. The program brewmeta also allows users to quickly access metadata about specific palettes (for example, colorblindness, LCD display, print, and photocopier friendliness).

Additional information
columbus15_buchanan.pdf

Colombian industrial structure behavior and its regions between 1974 and 2005

Luis Fernando Lopez Pineda
Chamber of Commerce of Cartagena
This presentation analyzes Colombian industrial structure behavior and its regions between 1974 and 2005 to determinate if the liberal reform at the end of the 20th century caused the industrial stagnation and its lack of diversification. Evidence proves that the "slowdown" of industrial growth and the stagnation of productive transformation were caused by the greatest competition for national industry since the application of an opening model. The process was not similar in all regions covered in the study. The more industrial regions, specifically, Antioquia, Atlantico, Valle, and Bogota, suffered from deindustrialization. The less industrial regions, like Bolivar and Cundinamarca, became industrial regions.

Additional information
columbus15_lopez_pineda.pdf

Scientific organizers

Timothy R. Sahr, (coordinator) Ohio Colleges of Medicine Government Resource Center Applied Research

Stanley Lemeshow, (chair of review team) Ohio State University Biostatistics

Marcus Berzofsky, RTI, International Survey Research

Christopher Browning, Ohio State University Sociology

Anand Desai, Ohio State University Public Policy

Christopher Holloman, Ohio State University Statistics

Bo Lu, Ohio State University Biostatistics

Eric Seiber, Ohio State University Health Economics

Logistics organizers

Nathan Bishop, StataCorp

Chris Farrar, StataCorp

Gretchen Farrar, StataCorp