Join Stata users and experts at the 2018 Stata Conference in Columbus on July 19–20.
Experience what happens when new and long-time Stata users from across all disciplines gather to discuss real-world applications of Stata. The Stata Conference provides an unparalleled opportunity for you to collaborate with Stata developers and connect with the inventive and creative user community. Don’t miss this great networking and learning opportunity.
Program talks include
- Nonlinear mixed-effects regression
Abstract: In many applications, such as biological and agricultural growth processes and pharmacokinetics, the time course of a continuous response for a subject over time may be characterized by a nonlinear function....(Read more)Parameters in these subject-specific nonlinear functions often have natural physical interpretations, and observations within the same subject are correlated. Subjects may be nested within higher-level groups, giving rise to nonlinear multilevel models, also known as nonlinear mixed-effects or hierarchical models. The new Stata 15 command menl allows you to fit nonlinear mixed-effects models, in which fixed and random effects may enter the model nonlinearly at different levels of hierarchy. In this talk, I will show you how to fit nonlinear mixed-effects models that contain random intercepts and slopes at different grouping levels with different covariance structures for both the random effects and the within-subject errors. I will also discuss parameter interpretation and highlight postestimation capabilities.Houssein Assaad, Senior Statistician and Software Developer(Read less)
- ERMs, simple tools for complicated data
Abstract: While the term "extended regression model" (ERM) may be new, the method is not. ERMs are regression models with continuous outcomes (including censored and tobit outcomes), binary outcomes, and ordered outcomes that...(Read more)are fit via maximum likelihood and that also account for endogenous covariates, sample selection, and nonrandom treatment assignment. These models can be used when you are worried about bias due to unmeasured confounding, trials with informative dropout, outcomes that are missing not at random, selection on unobservables, and more. ERMs provide a unifying framework for handling these complications individually or in combination. Charles Lindsey will briefly review the types of complications that ERMs can address. He will work through examples that demonstrate several of these complications and show some inferences we can make despite those complications.Charles Lindsey, Senior Statistician and Software Developer(Read less)
- Bayes for undergrads
Abstract: Teaching Markov Chain Monte Carlo Bayesian methods to undergraduates can be challenging because they, for the most part, are not familiar with advanced methodologies such as multilevel models, IRT, or other...(Read more)analytical methods that are commonly found in Bayesian analyses. However, almost every undergraduate is familiar with the t-test. This presentation will use Stata's bayesmh command to perform a two-sample independent t-test. We will discuss the advantages to using a Bayesian approach to perform t-test type analyses and compare the output or results with the traditional frequentist t-test.Phil Ender, UCLA (Ret)(Read less)
- dtalink: Faster probabilistic record linking and deduplication
methods in Stata for large data files
Abstract: Stata users often need to link records from two or more data files, or find duplicates within data files. Probabilistic linking methods are often used when the file or files do not have reliable or unique...(Read more)identifiers, causing deterministic linking methods (such as Stata's merge or duplicates commands) to fail. For example, one might need to link files that only include inconsistently spelled names, dates of birth with typos or missing data, and addresses that change over time. Probabilistic linkage methods score each potential pair of records on the probability the two records match so that pairs with higher overall scores indicate a better match than pairs with lower scores. Two community-contributed Stata commands for probabilistic linking exist (reclink and reclink2), but they do not scale efficiently. dtalink is a new command that offers streamlined probabilistic linking methods implemented in parallelized Mata code. Significant speed improvements make it practical to implement probabilistic linking methods on large, administrative data files (files with many rows or matching variables) and new features offer more flexible scoring and many-to-many matching techniques. The presentation introduces dtalink, discusses useful tips and tricks, and provides an example of linking Medicaid and birth certificates data.Keith Kranker, Mathematica Policy Research(Read less)
- PPMLHDFE: Fast, flexible Poisson estimation with high-dimensional
Abstract: This is a Stata package for estimation of Poisson models with high-dimensional fixed effects. It is a joint effort by Sergio Correia (the author of reghdfe), Paulo Guimaraes (the author of poi2hdfe), and...(Read more)myself (the author of ppml_panel_sg). This new command has several very desirable features that we expect will make it very popular. Like ppml_panel_sg, it is ideally suited for Poison PML estimation of structural gravity models - a workhorse empirical model in economics used to identify spatial frictions. However, like reghdfe, it can be used with any set of fixed effects. Furthermore, like poi2hdfe, it runs on an IRLS loop using the reghdfe architecture to perform each least squares step. This alone makes it very fast. But we have also implemented several additional speed-up tricks for IRLS HDFE estimation that allow for significant further speed gains. In addition, we are working toward a novel way of verifying beforehand that Poisson estimates exist that are robust to the inclusion of high-dimensional fixed effects.Thomas Zylkin, University of Richmond(Read less)
- The empirical analysis of core housing need in Canada: Evidence
from the survey of household spending
Abstract: In 2011, approximately 12% of Canadian households were in Core Housing Need (CHN), meaning that these households live in housing that requires major repair (adequacy), does not have enough bedrooms for the size of...(Read more)the household (suitability), costs 30% or more of before-tax income (affordability), or any combination of these three. Moreover, these households would have to spend 30% or more of their income to access local housing that meets the three standards. In 2017, Canada Mortgage and Housing Corporation (CMHC) announces a vision for the National Housing Strategy (NHS), which, among other things, aims to reduce the number of households in CHN. This study exploits rich microdata files from Statistics Canada, the Survey of Household Spending (SHS), and fitted models for the three standards using non-recursive Generalized Structural Equation Model (GSEM) to explain socioeconomic and demographic drivers of CHN. This study also sheds light to policymakers in the development of various policy levers by predicting the impact of housing initiatives on changes in the likelihood of being in CHN.Duangsuda Sopchokchai, Canada Mortgage and Housing Corporation(Read less)
- Doing less with Stata Markdown
Abstract: Stata’s new dyndoc and its sister commands provide a rich set of tools for reimagining document writing. An example of this is a document translator, stmd, that converts dynamic documents written with plain...(Read more)Markdown tags to Stata’s dyndoc format. This allows the user to write documents in the simple, uncluttered Markdown style used with other programming languages and on websites and still use many of dyndoc’s features such as executing code and embedding graphics links.Doug Hemken, Social Science Computing Cooperative(Read less)
University of Wisconsin-Madison
- Vector-based kernel weighting: A simple estimator for improving precision
and bias of average treatment effects in multiple treatment settings
Abstract: Treatment effect estimation must account for endogeneity, in which factors affect treatment assignment and outcomes simultaneously. By ignoring endogeneity, we risk concluding that a helpful treatment is not...(Read more)beneficial or that a treatment is safe when it is actually harmful. Propensity score (PS) matching or weighting adjusts for observed endogeneity, but matching becomes impracticable with multiple treatments, and weighting methods are sensitive to PS model misspecification in applied analyses. We used Monte Carlo simulations (1,000 replications) to examine sensitivity of multi-valued treatment inferences to PS weighting or matching strategies. We consider four variants of PS adjustment: inverse probability of treatment weights (IPTW), kernel weights, vector matching, and a new hybrid –vector-based kernel weighting (VBKW). VBKW matches observations with similar PS vectors, assigning greater kernel weights to observations with similar probabilities within a given bandwidth. We varied degree of PS model misspecification, sample size, number of treatment groups, and sample distribution across treatment groups. Across simulations, VBKW performed equally or better than the other methods in terms of bias and efficiency. VBKW may be less sensitive to PS model misspecification than other methods used to account for endogeneity in multi-valued treatment analyses.Jessica Lum, Department of Veterans Affairs(Read less)
- New data cleaning command: assertlist–improves speed and accuracy of
Abstract: Stata’s handy assert command can certify that a dataset meets a set of user expectations, but when one assertion is violated, it throws an error and does not proceed to check the rest. Identifying problems with...(Read more)every variable in a large dataset can involve a messy set of ad hoc error traps and LIST commands to learn what unexpected values occur in what dataset rows. Furthermore, code to REPLACE errant values sometimes involves IF syntax with a list of terms connected by Boolean ANDs that identify the row targeted for the fix; when typed by hand, these rows are quite susceptible to typographical errors. This talk describe a new command, assertlist, that can test an entire set of assertions in one run without ad hoc code to drill down or move on. Exceptions are listed either to the screen or a spreadsheet. In situations where problematic values will later be corrected or replaced, assertlist generates spreadsheet columns that wait to receive hand-entered corrected values and other columns that immediately put corrected values into Stata REPLACE commands for easy pasting into downstream .do files. In our experience, assertlist streamlines well-documented data cleaning and guards against errors in correction code.Dale Rhoda, Biostat Global Consulting(Read less)
- Organ pipe plots for clustered datasets–visualize disparities in
cluster level coverage
Abstract: Leo Tolstoy is famous for his novels and less well known for his ideas on survey data analysis. Concerning estimated proportions, he is said to have written: ͞Covered strata are all alike; every poorly covered...(Read more)stratum is poorly covered in its own way.͟ I describe a new command to make what we call organ pipe plots to visualize heterogeneity in binary outcomes in clustered data. The plots were conceived for vaccination coverage surveys, but they are helpful in a wide variety of contexts. Imagine a survey where only 50% of sampled children are found to be vaccinated. Different programmatic responses would be appropriate if the vaccinated include all the children in half the clusters versus half the children in all the clusters. These plots have been used to identify neighborhoods that were surreptitiously and intentionally skipped over during vaccination campaigns. The talk will demonstrate the command and discuss similarities with Pareto plots from quality control and a visual connection to the intracluster correlation coefficient (ICC). Note that the ICC shares a connection to anarcho-pacifistic ideas in Tolstoy’s later novels: many students mention them…but few can describe them clearly.Mary Prier, Biostat Global Consulting(Read less)
- Even simpler standard errors for two-stage optimization estimators:
Mata implementation via the DERIV command
Abstract: Terza (2016a) offers a heretofore unexploited simplification (henceforth referred to as SIMPLE) of the conventional formulation for the standard errors of two-stage optimization estimators (2SOE). In that paper,...(Read more)SIMPLE was illustrated in the context of two-stage residual inclusion (2SRI) estimation (Terza et al., 2008). Stata/Mata implementations of SIMPLE for 2SRI estimators are detailed in Terza (2017a and b). Terza (2016b) develops a variant of SIMPLE for calculating the standard errors of two-stage marginal effects estimators (2SME). Generally applicable Stata/Mata implementation of SIMPLE for 2SME is detailed in Terza (2017c) and compared with results from the Stata MARGINS command (for the subset of cases in which the MARGINS command is available). Although SIMPLE substantially reduces the analytic and coding burden imposed by the conventional formulation, it still requires the derivation and coding of key partial derivatives that may prove daunting for some model specifications. In this presentation, I detail how such analytic demands and coding requirements are virtually eliminated via the use of the Mata DERIV command. I will discuss illustrations in the 2SRI and 2SME contexts.Joseph Terza, Department of Economics
Terza, J., A. Basu, and P. Rathouz (2008). Two-stage residual inclusion estimation. Addressing endogeneity in health econometric modeling. Journal of Health Economics 27: 531-543.
Terza, J.V. (2016a). Simpler standard errors for two-stage optimization estimators. Stata Journal 16: 368-385.
Terza, J.V. (2016b). Inference using sample means of parametric nonlinear data transformations. Health Services Research 51: 1109-1113.
Terza, J.V. (2017a). Two-stage residual inclusion estimation: A practitioners guide to Stata implementation. Stata Journal 17: 916-938.
Terza, J.V. (2017b). Two-stage residual inclusion estimation in health services research and health economics. Health Services Research, forthcoming, DOI: 10.1111/1475-6773.12714.
Terza, J.V. (2017c). Causal effect estimation and inference using Stata. Stata Journal 17: 939-961.
Indiana University Purdue University Indianapolis
- Automating exploratory data analysis tasks with eda
Abstract: Several tools currently exist in the Stata ecosystem for document preparation, authoring, and creation, each with their own unique strengths. Similarly, there are many tools available to map data to...(Read more)visual dimensions for exploratory and expositive purposes. While these tools are powerful on their own, they do not attempt to solve the most significant resource constraint we all face. The eda command is designed to address this time constraint by automating the creation of all the univariate and bivariate data visualizations and summary statistics tables in a data set. Users can specify categorical and continuous variables manually, provide their own rules based on the number of unique values, or allow eda to use its own defaults and eda will apply the necessary logic to graph and describe the data available. The command is designed to produce the maximum amount of output by default, so a single line of code can easily produce a document providing substantial insight into your data.Billy Buchanan, Fayette County Public Schools(Read less)
- Output and automatic reporting using putdocx/putpdf
Abstract: Are you tired of copying and pasting tables, titles, figures, paragraphs, and footnotes in Excel into Word or pdf files ? Here is good news: Stata 15 has released a new feature that creates analysis tables,...(Read more)figures, footnotes, and paragraphs directly in Word or pdf files. The new command, putdocx/putpdf, serves as a one-stop-shop tool for transforming your Stata codes into Word or pdf file. This presentation will show you how to generate analysis tables, figures, and discussion or summary paragraphs directly in Word or pdf format. Plus, instead of manually updating the new numbers in your tables, figures, summary paragraphs, or footnotes when periodic updates are required, all you must do is refresh the dataset and run your existing .do file of putdocx/putpdf, and call it to see the instantly updated results directly in Word/pdf file. This can be done in one click. More specifically, below is a list of formatting and analysis results to be shown out of putdocx/putpdf and output directly in a Word or pdf file: 1. Paragraphs with statistics in them, 2. Figures, 3. Tables - • descriptive summary table • regression table • logistic regression table • survival analysis table, etc. 4. Automation of exporting, and 5. Combination of several .docx files into one summary report.Dong Hua, Corrona, LLC(Read less)
- How are population pressures and carbon dioxide emissions linked in
Abstract: This study uses Stata commands to investigate the relationship between population pressures and carbon dioxide (CO2) emissions in African countries at different income groupings. The Stata commands are...(Read more)particularly critical when linked with the population-pressures–environment nexus as indicated by the empirical results. Yet, the evidence on the panel-data Stata commands analysis is surprisingly critical, inconclusive, and subject to further investigation. We investigate this issue using descriptive, empirical and rich anthropogenic drivers linked Stata commands population-pressures panel data from 1960–2012. The Stata commands provide efficient descriptive analysis to suggest that the combined CO2 emissions concentration from 1960–2010 indicates an approximate 565%, 1,286%, and 505% increase in emission concentration in upper-income countries in Africa (UICA), lower-middle-income countries in Africa (LMICA), and low-income countries in Africa (LICA), while the total population increased from 1960–2012, accounting for an approximate 221%, 301% and 315% increase in population size, respectively. Furthermore, we provide evidence on the moderating roles of the final consumption expenditure (annual growth), manufacturing sector, and services sector in the linkages between population and CO2 emissions. Stata commands are robust for controlling for contemporaneous correlation, panel heteroskedasticity, and serial correlation. The empirical findings support negative impacts, and suggest the average effect of population growth over CO2 emissions, when the population growth changes across time and, between countries in LICA, LMICA, and UICA rises by 1% and increases CO2 emissions by about 0.38%, 1.08%, and 0.31% respectively, holding all other predictors constant. Overall, the results suggest that particular attention should be devoted to the population size and population growth in ameliorating CO2 emissions, especially countries with high population size such as Nigeria, Egypt, and Ethiopia that all belong to the LIMCA but also support low emission loads in LICA.Abdulrasaki Saka, Federal Polytechnic Offa, Nigeria(Read less)
- Assessing the calibration of dichotomous outcome models with the
Abstract: The calibration belt is a graphical approach designed to evaluate the goodness of fit of binary outcome models such as logistic regression models. The calibration belt examines the relationship between estimated...(Read more)probabilities and observed outcome rates. Significant deviations from the perfect calibration can be spotted on the graph. The graphical approach is paired to a statistical test, synthesizing the calibration assessment in a standard hypothesis testing framework. We present the calibrationbelt command, which implements the calibration belt and its associated test in Stata.Giovanni Nattino, The Ohio State University(Read less)
- And more.
The final program is coming soon. Sign up now to receive an email notification when the program is posted.
Ohio State University
Ohio Colleges of Medicine
Government Resource Center
Bowling Green State University
Ohio State University
National Science Foundation
Ohio State University
Ohio State University
Ohio Department of Medicaid
Seats are limited. Choose one of the options below. Lunch and refreshments are included in the registration fee.
Day 1: Thursday,
July 19, 2018
Day 2: Friday,
July 20, 2018
July 19, 2018
The optional users dinner will be at Rodizio Grill on Thursday,
July 19, at 6:30.
125 W Nationwide Blvd
Columbus, OH 43215
The Hyatt Regency Columbus is offering a special rate of $164 per night for Stata Conference attendees staying between July 17–21, 2018. There is limited availability, so book your room by June 28 to receive the special rate.
Hyatt Regency Columbus
350 North High Street
Columbus, OH 43215
The conference hotel is within steps of the Arena District, Huntington Park, and the popular Short North Arts and Entertainment District. After the Conference, relax and enjoy what Columbus has to offer. From the world-class Columbus Zoo and Aquarium to the Franklin Park Conservatory and Botanical Gardens, you will find much to do during your stay.