»  Home »  Stata Conferences and Users Group meetings »  Stata Conference Columbus 2018
Columbus 2018
July 19-20

Join Stata users and experts at the 2018 Stata Conference in Columbus on July 19–20.

Experience what happens when new and long-time Stata users from across all disciplines gather to discuss real-world applications of Stata. The Stata Conference provides an unparalleled opportunity for you to collaborate with Stata developers and connect with the inventive and creative user community. Don’t miss this great networking and learning opportunity.

## Program talks include

• Nonlinear mixed-effects regression
Abstract: In many applications, such as biological and agricultural growth processes and pharmacokinetics, the time course of a continuous response for a subject over time may be characterized by a nonlinear function.
Parameters in these subject-specific nonlinear functions often have natural physical interpretations, and observations within the same subject are correlated. Subjects may be nested within higher-level groups, giving rise to nonlinear multilevel models, also known as nonlinear mixed-effects or hierarchical models. The new Stata 15 command menl allows you to fit nonlinear mixed-effects models, in which fixed and random effects may enter the model nonlinearly at different levels of hierarchy. In this talk, I will show you how to fit nonlinear mixed-effects models that contain random intercepts and slopes at different grouping levels with different covariance structures for both the random effects and the within-subject errors. I will also discuss parameter interpretation and highlight postestimation capabilities.
Houssein Assaad, Senior Statistician and Software Developer
StataCorp
• ERMs, simple tools for complicated data
Abstract: While the term "extended regression model" (ERM) may be new, the method is not. ERMs are regression models with continuous outcomes (including censored and tobit outcomes), binary outcomes, and ordered outcomes that
are fit via maximum likelihood and that also account for endogenous covariates, sample selection, and nonrandom treatment assignment. These models can be used when you are worried about bias due to unmeasured confounding, trials with informative dropout, outcomes that are missing not at random, selection on unobservables, and more. ERMs provide a unifying framework for handling these complications individually or in combination. Charles Lindsey will briefly review the types of complications that ERMs can address. He will work through examples that demonstrate several of these complications and show some inferences we can make despite those complications.
Charles Lindsey, Senior Statistician and Software Developer
StataCorp
Abstract: Teaching Markov Chain Monte Carlo Bayesian methods to undergraduates can be challenging because they, for the most part, are not familiar with advanced methodologies such as multilevel models, IRT, or other
analytical methods that are commonly found in Bayesian analyses. However, almost every undergraduate is familiar with the t-test. This presentation will use Stata's bayesmh command to perform a two-sample independent t-test. We will discuss the advantages to using a Bayesian approach to perform t-test type analyses and compare the output or results with the traditional frequentist t-test.
Phil Ender, UCLA (Ret)
• dtalink: Faster probabilistic record linking and deduplication methods in Stata for large data files
Abstract: Stata users often need to link records from two or more data files, or find duplicates within data files. Probabilistic linking methods are often used when the file or files do not have reliable or unique
Keith Kranker, Mathematica Policy Research
• PPMLHDFE: Fast, flexible Poisson estimation with high-dimensional fixed effects
Abstract: This is a Stata package for estimation of Poisson models with high-dimensional fixed effects. It is a joint effort by Sergio Correia (the author of reghdfe), Paulo Guimaraes (the author of poi2hdfe), and
myself (the author of ppml_panel_sg). This new command has several very desirable features that we expect will make it very popular. Like ppml_panel_sg, it is ideally suited for Poison PML estimation of structural gravity models - a workhorse empirical model in economics used to identify spatial frictions. However, like reghdfe, it can be used with any set of fixed effects. Furthermore, like poi2hdfe, it runs on an IRLS loop using the reghdfe architecture to perform each least squares step. This alone makes it very fast. But we have also implemented several additional speed-up tricks for IRLS HDFE estimation that allow for significant further speed gains. In addition, we are working toward a novel way of verifying beforehand that Poisson estimates exist that are robust to the inclusion of high-dimensional fixed effects.
Thomas Zylkin, University of Richmond
• The empirical analysis of core housing need in Canada: Evidence from the survey of household spending
Abstract: In 2011, approximately 12% of Canadian households were in Core Housing Need (CHN), meaning that these households live in housing that requires major repair (adequacy), does not have enough bedrooms for the size of
the household (suitability), costs 30% or more of before-tax income (affordability), or any combination of these three. Moreover, these households would have to spend 30% or more of their income to access local housing that meets the three standards. In 2017, Canada Mortgage and Housing Corporation (CMHC) announces a vision for the National Housing Strategy (NHS), which, among other things, aims to reduce the number of households in CHN. This study exploits rich microdata files from Statistics Canada, the Survey of Household Spending (SHS), and fitted models for the three standards using non-recursive Generalized Structural Equation Model (GSEM) to explain socioeconomic and demographic drivers of CHN. This study also sheds light to policymakers in the development of various policy levers by predicting the impact of housing initiatives on changes in the likelihood of being in CHN.
Duangsuda Sopchokchai, Canada Mortgage and Housing Corporation
• Doing less with Stata Markdown
Abstract: Stata’s new dyndoc and its sister commands provide a rich set of tools for reimagining document writing. An example of this is a document translator, stmd, that converts dynamic documents written with plain
Markdown tags to Stata’s dyndoc format. This allows the user to write documents in the simple, uncluttered Markdown style used with other programming languages and on websites and still use many of dyndoc’s features such as executing code and embedding graphics links.
Doug Hemken, Social Science Computing Cooperative
• Vector-based kernel weighting: A simple estimator for improving precision and bias of average treatment effects in multiple treatment settings
Abstract: Treatment effect estimation must account for endogeneity, in which factors affect treatment assignment and outcomes simultaneously. By ignoring endogeneity, we risk concluding that a helpful treatment is not
beneficial or that a treatment is safe when it is actually harmful. Propensity score (PS) matching or weighting adjusts for observed endogeneity, but matching becomes impracticable with multiple treatments, and weighting methods are sensitive to PS model misspecification in applied analyses. We used Monte Carlo simulations (1,000 replications) to examine sensitivity of multi-valued treatment inferences to PS weighting or matching strategies. We consider four variants of PS adjustment: inverse probability of treatment weights (IPTW), kernel weights, vector matching, and a new hybrid –vector-based kernel weighting (VBKW). VBKW matches observations with similar PS vectors, assigning greater kernel weights to observations with similar probabilities within a given bandwidth. We varied degree of PS model misspecification, sample size, number of treatment groups, and sample distribution across treatment groups. Across simulations, VBKW performed equally or better than the other methods in terms of bias and efficiency. VBKW may be less sensitive to PS model misspecification than other methods used to account for endogeneity in multi-valued treatment analyses.
Jessica Lum, Department of Veterans Affairs
• New data cleaning command: assertlist–improves speed and accuracy of collaborative correction
Abstract: Stata’s handy assert command can certify that a dataset meets a set of user expectations, but when one assertion is violated, it throws an error and does not proceed to check the rest. Identifying problems with
every variable in a large dataset can involve a messy set of ad hoc error traps and LIST commands to learn what unexpected values occur in what dataset rows. Furthermore, code to REPLACE errant values sometimes involves IF syntax with a list of terms connected by Boolean ANDs that identify the row targeted for the fix; when typed by hand, these rows are quite susceptible to typographical errors. This talk describe a new command, assertlist, that can test an entire set of assertions in one run without ad hoc code to drill down or move on. Exceptions are listed either to the screen or a spreadsheet. In situations where problematic values will later be corrected or replaced, assertlist generates spreadsheet columns that wait to receive hand-entered corrected values and other columns that immediately put corrected values into Stata REPLACE commands for easy pasting into downstream .do files. In our experience, assertlist streamlines well-documented data cleaning and guards against errors in correction code.
Dale Rhoda, Biostat Global Consulting
• Organ pipe plots for clustered datasets–visualize disparities in cluster level coverage
Abstract: Leo Tolstoy is famous for his novels and less well known for his ideas on survey data analysis. Concerning estimated proportions, he is said to have written: ͞Covered strata are all alike; every poorly covered
stratum is poorly covered in its own way.͟ I describe a new command to make what we call organ pipe plots to visualize heterogeneity in binary outcomes in clustered data. The plots were conceived for vaccination coverage surveys, but they are helpful in a wide variety of contexts. Imagine a survey where only 50% of sampled children are found to be vaccinated. Different programmatic responses would be appropriate if the vaccinated include all the children in half the clusters versus half the children in all the clusters. These plots have been used to identify neighborhoods that were surreptitiously and intentionally skipped over during vaccination campaigns. The talk will demonstrate the command and discuss similarities with Pareto plots from quality control and a visual connection to the intracluster correlation coefficient (ICC). Note that the ICC shares a connection to anarcho-pacifistic ideas in Tolstoy’s later novels: many students mention them…but few can describe them clearly.
Mary Prier, Biostat Global Consulting
• Even simpler standard errors for two-stage optimization estimators: Mata implementation via the DERIV command
Abstract: Terza (2016a) offers a heretofore unexploited simplification (henceforth referred to as SIMPLE) of the conventional formulation for the standard errors of two-stage optimization estimators (2SOE). In that paper,
SIMPLE was illustrated in the context of two-stage residual inclusion (2SRI) estimation (Terza et al., 2008). Stata/Mata implementations of SIMPLE for 2SRI estimators are detailed in Terza (2017a and b). Terza (2016b) develops a variant of SIMPLE for calculating the standard errors of two-stage marginal effects estimators (2SME). Generally applicable Stata/Mata implementation of SIMPLE for 2SME is detailed in Terza (2017c) and compared with results from the Stata MARGINS command (for the subset of cases in which the MARGINS command is available). Although SIMPLE substantially reduces the analytic and coding burden imposed by the conventional formulation, it still requires the derivation and coding of key partial derivatives that may prove daunting for some model specifications. In this presentation, I detail how such analytic demands and coding requirements are virtually eliminated via the use of the Mata DERIV command. I will discuss illustrations in the 2SRI and 2SME contexts.
Terza, J., A. Basu, and P. Rathouz (2008). Two-stage residual inclusion estimation. Addressing endogeneity in health econometric modeling. Journal of Health Economics 27: 531-543.
Terza, J.V. (2016a). Simpler standard errors for two-stage optimization estimators. Stata Journal 16: 368-385.
Terza, J.V. (2016b). Inference using sample means of parametric nonlinear data transformations. Health Services Research 51: 1109-1113.
Terza, J.V. (2017a). Two-stage residual inclusion estimation: A practitioners guide to Stata implementation. Stata Journal 17: 916-938.
Terza, J.V. (2017b). Two-stage residual inclusion estimation in health services research and health economics. Health Services Research, forthcoming, DOI: 10.1111/1475-6773.12714.
Terza, J.V. (2017c). Causal effect estimation and inference using Stata. Stata Journal 17: 939-961.
Joseph Terza, Department of Economics
Indiana University Purdue University Indianapolis
• Automating exploratory data analysis tasks with eda
Abstract: Several tools currently exist in the Stata ecosystem for document preparation, authoring, and creation, each with their own unique strengths. Similarly, there are many tools available to map data to
visual dimensions for exploratory and expositive purposes. While these tools are powerful on their own, they do not attempt to solve the most significant resource constraint we all face. The eda command is designed to address this time constraint by automating the creation of all the univariate and bivariate data visualizations and summary statistics tables in a data set. Users can specify categorical and continuous variables manually, provide their own rules based on the number of unique values, or allow eda to use its own defaults and eda will apply the necessary logic to graph and describe the data available. The command is designed to produce the maximum amount of output by default, so a single line of code can easily produce a document providing substantial insight into your data.
Billy Buchanan, Fayette County Public Schools
• Output and automatic reporting using putdocx/putpdf
Abstract: Are you tired of copying and pasting tables, titles, figures, paragraphs, and footnotes in Excel into Word or pdf files ? Here is good news: Stata 15 has released a new feature that creates analysis tables,
figures, footnotes, and paragraphs directly in Word or pdf files. The new command, putdocx/putpdf, serves as a one-stop-shop tool for transforming your Stata codes into Word or pdf file. This presentation will show you how to generate analysis tables, figures, and discussion or summary paragraphs directly in Word or pdf format. Plus, instead of manually updating the new numbers in your tables, figures, summary paragraphs, or footnotes when periodic updates are required, all you must do is refresh the dataset and run your existing .do file of putdocx/putpdf, and call it to see the instantly updated results directly in Word/pdf file. This can be done in one click. More specifically, below is a list of formatting and analysis results to be shown out of putdocx/putpdf and output directly in a Word or pdf file: 1. Paragraphs with statistics in them, 2. Figures, 3. Tables - • descriptive summary table • regression table • logistic regression table • survival analysis table, etc. 4. Automation of exporting, and 5. Combination of several .docx files into one summary report.
Dong Hua, Corrona, LLC
• How are population pressures and carbon dioxide emissions linked in Africa?
Abstract: This study uses Stata commands to investigate the relationship between population pressures and carbon dioxide (CO2) emissions in African countries at different income groupings. The Stata commands are
particularly critical when linked with the population-pressures–environment nexus as indicated by the empirical results. Yet, the evidence on the panel-data Stata commands analysis is surprisingly critical, inconclusive, and subject to further investigation. We investigate this issue using descriptive, empirical and rich anthropogenic drivers linked Stata commands population-pressures panel data from 1960–2012. The Stata commands provide efficient descriptive analysis to suggest that the combined CO2 emissions concentration from 1960–2010 indicates an approximate 565%, 1,286%, and 505% increase in emission concentration in upper-income countries in Africa (UICA), lower-middle-income countries in Africa (LMICA), and low-income countries in Africa (LICA), while the total population increased from 1960–2012, accounting for an approximate 221%, 301% and 315% increase in population size, respectively. Furthermore, we provide evidence on the moderating roles of the final consumption expenditure (annual growth), manufacturing sector, and services sector in the linkages between population and CO2 emissions. Stata commands are robust for controlling for contemporaneous correlation, panel heteroskedasticity, and serial correlation. The empirical findings support negative impacts, and suggest the average effect of population growth over CO2 emissions, when the population growth changes across time and, between countries in LICA, LMICA, and UICA rises by 1% and increases CO2 emissions by about 0.38%, 1.08%, and 0.31% respectively, holding all other predictors constant. Overall, the results suggest that particular attention should be devoted to the population size and population growth in ameliorating CO2 emissions, especially countries with high population size such as Nigeria, Egypt, and Ethiopia that all belong to the LIMCA but also support low emission loads in LICA.
Abdulrasaki Saka, Federal Polytechnic Offa, Nigeria
• Assessing the calibration of dichotomous outcome models with the calibration belt
Abstract: The calibration belt is a graphical approach designed to evaluate the goodness of fit of binary outcome models such as logistic regression models. The calibration belt examines the relationship between estimated
probabilities and observed outcome rates. Significant deviations from the perfect calibration can be spotted on the graph. The graphical approach is paired to a statistical test, synthesizing the calibration assessment in a standard hypothesis testing framework. We present the calibrationbelt command, which implements the calibration belt and its associated test in Stata.
Giovanni Nattino, The Ohio State University
• And more.

## Scientific committee

Stan Lemeshow (chair)
Ohio State University
Public Health
Timothy R. Sahr (coordinator)
Ohio Colleges of Medicine
Government Resource Center
Kelly Balistreri
Bowling Green State University
Chris Browning
Ohio State University
Sociology
Anand Desai
National Science Foundation
Bo Lu
Ohio State University
Biostatistics
Eric Seiber
Ohio State University
Public Health
Mary Applegate
Ohio Department of Medicaid
Anirudh Ruhil
Ohio University

## Registration

Seats are limited. Choose one of the options below. Lunch and refreshments are included in the registration fee.

Price Student price
Both days
 $195 Register $75 Register
Day 1: Thursday,
July 19, 2018
 $125 Register $50 Register
Day 2: Friday,
July 20, 2018
 $125 Register $50 Register
Dinner (optional)
July 19, 2018

## Venue

Hyatt Regency Columbus
350 North High Street
Columbus, OH 43215

The conference hotel is within steps of the Arena District, Huntington Park, and the popular Short North Arts and Entertainment District. After the Conference, relax and enjoy what Columbus has to offer. From the world-class Columbus Zoo and Aquarium to the Franklin Park Conservatory and Botanical Gardens, you will find much to do during your stay.