Stata Conference Columbus 2018

Columbus 2018

19-20 July

The Stata Conference was held 19-20 July 2018, but you can view the proceedings and presentation slides (below) and the conference photos.

9:00–9:20	Analysis of surgical outcomes in clustered data: Approaches and interpretation Abstract: Observational clinical studies increasingly use large and complex datasets representing patients who are clustered by provider, institution, or geographic location. ...(Read more) Previous research on surgical outcomes (including morbidity, mortality, and subsequent healthcare utilization) has highlighted provider technique and experience, center volume-outcomes relationships, and geographical disparities in the quality of surgical care as important applications of clustered data analysis. In regression models, the nonindependence of outcomes within each cluster may be handled through cluster–robust standard errors or introduction of cluster-level fixed or random effects. However, clinical studies rarely articulate and occasionally misinterpret the rationale for applying these methods. I review recent literature on surgical outcomes to describe how the choice of approach may be influenced by the intended comparison among clusters, theoretical expectation of specific cluster-level factors influencing patient outcomes, and clinical importance of residual variation among clusters. I then present an example from transplant surgery where the primary contribution of a mixed-effects model is made by interpreting residual county-level variation in posttransplant survival. (Read less) Additional information: columbus18_Tumin.pptx Dmitry Tumin The Ohio State University, Nationwide Children's Hospital
9:20–9:40	Organ pipe plots for clustered datasets–Visualize disparities in cluster-level coverage Abstract: Leo Tolstoy is famous for his novels and less well known for his ideas on survey data analysis. Concerning estimated proportions, he is said to have written: Covered strata are all alike; every poorly covered ...(Read more) stratum is poorly covered in its own way. I describe a new command to make what we call organ pipe plots to visualize heterogeneity in binary outcomes in clustered data. The plots were conceived for vaccination coverage surveys, but they are helpful in a wide variety of contexts. Imagine a survey where only 50% of sampled children are found to be vaccinated. Different programmatic responses would be appropriate if the vaccinated include all the children in half the clusters versus half the children in all the clusters. These plots have been used to identify neighborhoods that were surreptitiously and intentionally skipped over during vaccination campaigns. The talk will demonstrate the command and discuss similarities with Pareto plots from quality control and a visual connection to the intracluster correlation coefficient (ICC). Note that the ICC shares a connection to anarcho-pacifistic ideas in Tolstoy’s later novels: many students mention them … but few can describe them clearly. (Read less) Additional information: columbus18_Prier.pptx Mary Prier Biostat Global Consulting
9:40–10:00	Hepatobiliary-related outcomes in US adults exposed to lead Abstract: The purpose of this cross-sectional study was to investigate hepatobiliary-related clinical markers in Unites States adults (aged ≥ 20) exposed to lead using the National Health and Nutrition Examination Survey (NHANES) 2007–2008 and 2009–2010 datasets. ...(Read more) Clinical markers and occupation were evaluated in 4 quartiles of exposure—0–2 μg/dL, 2–5 μg/dL, 5–10 μg/dL, and 10 μg/dL and over—to examine how the markers and various occupations manifested in the quartiles. Linear regression determined associations, and binary logistic regression predicted the likelihood of elevated clinical markers using binary degrees of exposure set at 2 μg/dL, 5 μg/dL, and 10 μg/dL. Clinical markers, and how they manifested between exposed and less exposed occupations, were explored in addition to how duration of exposure altered these clinical markers. In regression analysis, gamma-glutamyl transferase (GGT), total bilirubin, and alkaline phosphatase (ALP) were positively and significantly associated with blood lead level (BLL). Using binary logistic regression models, at the binary 2 μg/dL level, ALP and GGT were more likely to be elevated in those exposed. At 5 μg/dL level, it was ALP and GGT that were more likely to be elevated in those exposed, whereas at the 10 μg/dL level, it was GGT that was more likely to be elevated in those exposed. In the occupational analysis, aspartate aminotransferase (AST), alanine aminotransferase (ALT), GGT, and ALP showed differences between populations in the exposed and less exposed occupations. Regarding agriculture, forestry and fishing, duration of exposure altered AST, ALP, and total bilirubin significantly (p < 0.05), while ALT and GGT were altered moderately significantly (p < 0.10). With mining, duration of exposure altered AST and GGT moderately significantly, whereas in construction, duration in occupation altered AST and GGT significantly and total bilirubin moderately significantly. The study findings are evidence of occupational exposure to lead playing a significant role in initiating and promoting adverse hepatobiliary clinical outcomes in United States adults. (Read less) Additional information: columbus18_Obeng-Gyasi.pptx Emmanuel Obeng-Gyasi North Carolina A&T State University
10:00–10:40	Disappearing Medicaid enrollment disparities for United States citizen children in immigrant families: An example of average marginal analyses for applied research Additional information: columbus18_Seiber.pptx Eric Seiber Ohio State University
11:10–11:40	Bayes for undergrads Abstract: Teaching Markov Chain Monte Carlo Bayesian methods to undergraduates can be challenging because they, for the most part, are not familiar with advanced methodologies such as multilevel models, IRT, or other ...(Read more) analytical methods that are commonly found in Bayesian analyses. However, almost every undergraduate is familiar with the t test. This presentation will use Stata's bayesmh command to perform a two-sample independent t test. We will discuss the advantages of using a Bayesian approach to perform t test-type analyses and compare the output or results with the traditional frequentist t test. (Read less) Additional information: columbus18_Ender.pdf Phil Ender UCLA (Ret)
11:40–12:10	Output and automatic reporting using putdocx/putpdf Abstract: Are you tired of copying and pasting tables, titles, figures, paragraphs, and footnotes in Excel into Word or pdf files? Here is good news: Stata 15 has released a new feature that creates analysis tables, ...(Read more) figures, footnotes, and paragraphs directly in Word or pdf files. The new command, putdocx/putpdf, serves as a one-stop-shop tool for transforming your Stata codes into Word or pdf files. This presentation will show you how to generate analysis tables, figures, and discussion or summary paragraphs directly in Word or pdf format. Plus, instead of manually updating the new numbers in your tables, figures, summary paragraphs, or footnotes when periodic updates are required, all you must do is refresh the dataset and run your existing do-file of putdocx/putpdf and call it to see the instantly updated results directly in the Word/pdf file. This can be done in one click. More specifically, below is a list of formatting and analysis results to be shown out of putdocx/putpdf and output directly in a Word or pdf file: 1. Paragraphs with statistics in them 2. Figures 3. Tables • descriptive summary table • regression table • logistic regression table • survival analysis table, etc. 4. Automation of exporting 5. Combination of several .docx files into one summary report. (Read less) Additional information: columbus18_Hua.pptx Winnie Hua Corrona, LLC
12:10–12:30	Assessing the calibration of dichotomous outcome models with the calibration belt Abstract: The calibration belt is a graphical approach designed to evaluate the goodness of fit of binary outcome models such as logistic regression models. The calibration belt examines the relationship between estimated ...(Read more) probabilities and observed outcome rates. Significant deviations from the perfect calibration can be spotted on the graph. The graphical approach is paired to a statistical test, synthesizing the calibration assessment in a standard hypothesis testing framework. We present the calibrationbelt command, which implements the calibration belt and its associated test in Stata. (Read less) Additional information: columbus18_Nattino.pdf Giovanni Nattino The Ohio State University, The Ohio Colleges of Medicine Government Resource Center
1:30–2:15	Nonlinear mixed-effects regression Abstract: In many applications, such as biological and agricultural growth processes and pharmacokinetics, the time course of a continuous response for a subject over time may be characterized by a nonlinear function. ...(Read more) Parameters in these subject-specific nonlinear functions often have natural physical interpretations, and observations within the same subject are correlated. Subjects may be nested within higher-level groups, giving rise to nonlinear multilevel models, also known as nonlinear mixed-effects or hierarchical models. The new Stata 15 command menl allows you to fit nonlinear mixed-effects models, in which fixed and random effects may enter the model nonlinearly at different levels of hierarchy. In this talk, I will show you how to fit nonlinear mixed-effects models that contain random intercepts and slopes at different grouping levels with different covariance structures for both the random effects and the within-subject errors. I will also discuss parameter interpretation and highlight postestimation capabilities. (Read less) Additional information: columbus18_Assaad (https:) Houssein Assaad, Senior Statistician and Software Developer StataCorp
2:15–3:00	ERMs, simple tools for complicated data Abstract: While the term "extended regression model" (ERM) may be new, the method is not. ERMs are regression models with continuous outcomes (including censored and tobit outcomes), binary outcomes, and ordered outcomes that ...(Read more) are fit via maximum likelihood and that also account for endogenous covariates, sample selection, and nonrandom treatment assignment. These models can be used when you are worried about bias due to unmeasured confounding, trials with informative dropout, outcomes that are missing not at random, selection on unobservables, and more. ERMs provide a unifying framework for handling these complications individually or in combination. I will briefly review the types of complications that ERMs can address. I will work through examples that demonstrate several of these complications and show some inferences we can make despite those complications. (Read less) Additional information: columbus18_Lindsey.pdf Charles Lindsey, Senior Statistician and Software Developer StataCorp
3:00–3:30	Even simpler standard errors for two-stage optimization estimators: Mata implementation via the deriv command Abstract: Terza (2016a) offers a heretofore unexploited simplification (henceforth referred to as SIMPLE) of the conventional formulation for the standard errors of two-stage optimization estimators (2SOE). In that paper, ...(Read more) SIMPLE was illustrated in the context of two-stage residual inclusion (2SRI) estimation (Terza et al., 2008). Stata/Mata implementations of SIMPLE for 2SRI estimators are detailed in Terza (2017a and b). Terza (2016b) develops a variant of SIMPLE for calculating the standard errors of two-stage marginal effects estimators (2SME). Generally applicable Stata/Mata implementation of SIMPLE for 2SME is detailed in Terza (2017c) and compared with results from the Stata margins command (for the subset of cases in which the margins command is available). Although SIMPLE substantially reduces the analytic and coding burden imposed by the conventional formulation, it still requires the derivation and coding of key partial derivatives that may prove daunting for some model specifications. In this presentation, I detail how such analytic demands and coding requirements are virtually eliminated via the use of the Mata deriv command. I will discuss illustrations in the 2SRI and 2SME contexts. References: Terza, J., A. Basu, and P. Rathouz (2008). Two-stage residual inclusion estimation. Addressing endogeneity in health econometric modeling. Journal of Health Economics 27: 531–543. Terza, J.V. (2016a). Simpler standard errors for two-stage optimization estimators. Stata Journal 16: 368–385. Terza, J.V. (2016b). Inference using sample means of parametric nonlinear data transformations. Health Services Research 51: 1109–1113. Terza, J.V. (2017a). Two-stage residual inclusion estimation: A practitioners guide to Stata implementation. Stata Journal 17: 916–938. Terza, J.V. (2017b). Two-stage residual inclusion estimation in health services research and health economics. Health Services Research, forthcoming, DOI: 10.1111/1475-6773.12714. Terza, J.V. (2017c). Causal effect estimation and inference using Stata. Stata Journal 17: 939–961. (Read less) Additional information: columbus18_Terza.pdf Joseph Terza Indiana University Purdue University Indianapolis
4:00–4:20	Estimating the average lifetime of nonmaturity deposits Abstract: Nonmaturity deposits represent funds placed with banks that have no contractually set time for maturing, or leaving the bank. ...(Read more) However, in the aggregate, we know there is a tendency to see such deposits become increasingly withdrawn as these accounts get older. Having an estimate of the lifetime of such deposit accounts is an important ingredient for calculating their present value. We show how to model the average lifetime based, first, on estimating the decay rate of deposit balances using Stata's nl command and, second, on calculating average lifetime based on the decay rate. (Read less) Additional information: columbus18_Price.pptx columbus18_Price.do Calvin Price MUFG
4:20–4:40	Automating exploratory data analysis tasks with eda Abstract: Several tools currently exist in Stata for document preparation, authoring, and creation, each with its own unique strengths. Similarly, there are many tools available to map data to ...(Read more) visual dimensions for exploratory and expositive purposes. While these tools are powerful on their own, they do not attempt to solve the most significant resource constraint we all face. The eda command is designed to address this time constraint by automating the creation of all the univariate and bivariate data visualizations and summary statistics tables in a dataset. Users can specify categorical and continuous variables manually, provide their own rules based on the number of unique values, or allow eda to use its own defaults, and eda will apply the necessary logic to graph and describe the data available. The command is designed to produce the maximum amount of output by default, so a single line of code can easily produce a document providing substantial insight into your data. (Read less) Additional information: columbus18_Buchanan (https:) Billy Buchanan Fayette County Public Schools
9:00–9:20	Ordinary least-products regression is a simple and powerful statistical tool to identify systematic disagreement between two measures: Fixed and proportional bias assessment Abstract: Background: We aimed to provide a statistical procedure to assess systematic disagreement between two measures, assuming that measurements made by either method are attended by random error. ...(Read more) Methods: We applied Bland–Altman analysis (baplot) and ordinary least-products (OLP) regression (manually) in three simulated pairs of samples (N=100). In OLP, values of y and x are used in the major axis regression analysis, but then intercept and slope are back-transformed by dividing them by (). Fixed bias was defined if 95% confidence interval (CI) of the intercept does not include 0. Proportional bias was defined if 95%CI of the slope does not include 1. Results: Using baplot, we found no fixed (bias=3.4 minutes/day; 95%CI=-10.4-17.2) and no proportional (r=-0.2; p=0.09) bias for physical activity (PA); and fixed (bias=-5.3 hour/day, 95%CI=-5.4--5.2; bias=4.5 hour/day; 95%CI=4.3- 4.7) and proportional (r=-0.9; p<0.01; r=0.8; p<0.01) bias for sedentary behaviour (SB) and sleep time, respectively. Using OLP, we found similar findings from baplot for PA (intercept=23.1; 95%CI=-3.04-49.3; slope=0.92; 95%CI=0.83-1.01) and sleep time (intercept=3.14; 95%CI=2.82-3.45; slope=1.20; 95%CI=1.16-1.24). However, we found no fixed and proportional bias (intercept=-0.04; 95%CI=-0.45-0.38; slope=0.20; 95%CI=-0.07-0.10) for SB. Conclusions: OLP could be included in Stata as a valid and comparable alternative to the Bland–Altman method. (Read less) Additional information: columbus18_Nascimento-Ferreira.pptx Marcus Vinicius Nascimento-Ferreira 1YCARE (Youth/Child cArdiovascular Risk and Environmental) Research Group, Universidade de São Paulo
9:20–9:50	Simple tools for saving time Abstract: This brief talk will show some simple tools for saving time when working with Stata. This will be a hodgepodge of items whose goal is to reduce the amount of thought, coordination, ...(Read more) and human memory required of common tasks in a complex work environment while speeding up such tasks greatly. (Read less) Additional information: columbus18_Rising.pdf Bill Rising, Director of Educational Services StataCorp
9:50–10:20	Vector-based kernel weighting: A simple estimator for improving precision and bias of average treatment effects in multiple treatment settings Abstract: Treatment-effect estimation must account for endogeneity, in which factors affect treatment assignment and outcomes simultaneously. By ignoring endogeneity, we risk concluding that a helpful treatment is not ...(Read more) beneficial or that a treatment is safe when it is actually harmful. Propensity-score (PS) matching or weighting adjusts for observed endogeneity, but matching becomes impracticable with multiple treatments, and weighting methods are sensitive to PS model misspecification in applied analyses. We used Monte Carlo simulations (1,000 replications) to examine sensitivity of multivalued treatment inferences to PS weighting or matching strategies. We consider four variants of PS adjustment: inverse probability of treatment weights (IPTW), kernel weights, vector matching, and a new hybrid—vector-based kernel weighting (VBKW). VBKW matches observations with similar PS vectors, assigning greater kernel weights to observations with similar probabilities within a given bandwidth. We varied the degree of PS model misspecification, sample size, number of treatment groups, and sample distribution across treatment groups. Across simulations, VBKW performed equally or better than the other methods in terms of bias and efficiency. VBKW may be less sensitive to PS model misspecification than other methods used to account for endogeneity in multivalued treatment analyses. (Read less) Additional information: columbus18_Lum.pdf Jessica Lum Department of Veterans Affairs
10:50–11:20	dtalink: Faster probabilistic record linking and deduplication methods in Stata for large data files Abstract: Stata users often need to link records from two or more data files or find duplicates within data files. Probabilistic linking methods are often used when the file or files do not have reliable or unique ...(Read more) identifiers, causing deterministic linking methods (such as Stata's merge or duplicates commands) to fail. For example, one might need to link files that only include inconsistently spelled names, dates of birth with typos or missing data, and addresses that change over time. Probabilistic linkage methods score each potential pair of records on the probability the two records match so that pairs with higher overall scores indicate a better match than pairs with lower scores. Two community-contributed Stata commands for probabilistic linking exist (reclink and reclink2), but they do not scale efficiently. dtalink is a new command that offers streamlined probabilistic linking methods implemented in parallelized Mata code. Significant speed improvements make it practical to implement probabilistic linking methods on large, administrative data files (files with many rows or matching variables), and new features offer more flexible scoring and many-to-many matching techniques. The presentation introduces dtalink, discusses useful tips and tricks, and provides an example of linking Medicaid and birth certificates data. (Read less) Additional information: columbus18_Kranker.pdf dtalink download from SSC Keith Kranker Mathematica Policy Research
11:20–11:40	Doing less with Stata Markdown Abstract: Stata’s new dyndoc and its sister commands provide a rich set of tools for reimagining document writing. An example of this is a document translator, stmd, that converts dynamic documents written with plain ...(Read more) Markdown tags to Stata’s dyndoc format. This allows the user to write documents in the simple, uncluttered Markdown style used with other programming languages and on websites and still use many of dyndoc’s features such as executing code and embedding graphics links. (Read less) Additional information: columbus18_Hemken.pptx Examples (.zip) Doug Hemken Social Science Computing Cooperative, University of Wisconsin–Madison
11:40–12:00	New data-cleaning command: assertlist improves speed and accuracy of collaborative correction Abstract: Stata’s handy assert command can certify that a dataset meets a set of user expectations, but when one assertion is violated, it throws an error and does not proceed to check the rest. Identifying problems with ...(Read more) every variable in a large dataset can involve a messy set of ad hoc error traps and list commands to learn what unexpected values occur in what dataset rows. Furthermore, code to replace errant values sometimes involves if syntax with a list of terms connected by Boolean ANDs that identify the row targeted for the fix; when typed by hand, these rows are quite susceptible to typographical errors. This talk describes a new command, assertlist, that can test an entire set of assertions in one run without ad hoc code to drill down or move on. Exceptions are listed either to the screen or a spreadsheet. In situations where problematic values will later be corrected or replaced, assertlist generates spreadsheet columns that wait to receive manually corrected values and other columns that immediately put corrected values into Stata replace commands for easy pasting into downstream do-files. In our experience, assertlist streamlines well-documented data cleaning and guards against errors in correction code. (Read less) Additional information: columbus18_Rhoda.pptx columbus18_Rhoda-Ten.xlsx columbus18_Rhoda-Ten_clean.xlsx Dale Rhoda Biostat Global Consulting
1:00–1:30	Regulation and US state-level corruption Abstract: I exploit a panel dataset on the US for 1990–2013 to evaluate the causal impact of government regulation on bureaucratic corruption. Despite the stylized fact that corruption and regulation are positively correlated, ...(Read more) there is a lack of empirical evidence to substantiate a causal relationship. Using novel data on federal regulation of industries (Al-Ubaydli and McLaughlin 2015) and convictions of public officials from the Public Integrity Section, I apply a stochastic frontier approach to account for one-sided measurement error in bureaucratic corruption and the Lewbel (2012) identification strategy to control for potential endogeneity of regulation. Results are striking. Based on the preferred model, there is evidence of endogeneity of regulation and absence of a causal link between regulation and corruption. However, if any of the above two econometric issues are ignored, evidence of a spurious relationship between corruption and regulation is found. (Read less) Additional information: columbus18_Choudhury.pdf Sanchari Choudhury Southern Methodist University
1:30–1:50	The satisfaction with healthcare services in the Emirate of Dubai using Dubai Household Survey 2014: Inpatient admission Abstract: The population of the Emirate of Dubai is 2.8 Million. The Dubai Health Authority (DHA) is the government entity that oversees healthcare in the Emirate. ...(Read more) Therefore, it is important to measure patients' satisfaction level with healthcare services in the Emirate to improve the services provided. This study (secondary data analysis) was collected through complex stratified (geographic area), multistage probability sampling. The study examines the satisfaction level with healthcare services in the Emirate of Dubai compared with those who were admitted as inpatients during the last 12 months by using ordered logistics regression. Satisfaction was used as a dependent variable, and many independent variables were used in the model, including suffering from a chronic disease and admission as an inpatient during the last 12 months. Other covariates included were age, gender, insurance type, and nationality. With respect to satisfaction level with healthcare services, we found there is no difference with having or not having a chronic disease, no difference between being male or female, and no difference with the age. All other insurance types are less likely to be satisfied compared with private insurance as a reference group. All other nationalities in Dubai are more likely to be satisfied compared with UAE nationals as a reference group. Not being admitted as an inpatient during the last 12 months in the Emirate of Dubai was more likely to be satisfied with the healthcare services compared with being admitted in the government sector as a reference group. There is need to improve the healthcare services in the Emirate of Dubai in the government sector through public private partnership and competing with the private sector to improve the services among all government health providers, including quality of care and waiting time. (Read less) Additional information: columbus18_Alnakhi.pptx Wafa Alnakhi Johns Hopkins University
1:50–2:10	Welfare gain of rice-grading information Abstract: This study examines consumers' value of rice-grade labeling information to identify the effectiveness of the new mandatory rice-grading policy in October 2018. ...(Read more) This study measures consumers' premiums for super, good, and normal grades before and after providing grade-labeling information using a nonhypothetical random nth experimental auction. We then estimate consumers' value of grade-labeling information by comparing with market premiums. The results suggest that consumers value the provision of grade-labeling information, with the highest value for the super grade. Given the grade-labeling information, the additional detailed information about grade labeling does not affect consumers' rice-purchasing behaviors. The findings suggest that the rice-grading information is the important factor differentiating domestic rice from imported rice, and it also provides consumers credible information on rice quality to make better purchasing decisions. (Read less) Additional information: columbus18_Han.pptx Doo Bong Han Korea University

Scientific committee

Stan Lemeshow (chair)
Ohio State University
Public Health

Timothy R. Sahr (coordinator)
Ohio Colleges of Medicine
Government Resource Center

Kelly Balistreri
Bowling Green State University

Chris Browning
Ohio State University
Sociology

Anand Desai
National Science Foundation

Bo Lu
Ohio State University
Biostatistics

Eric Seiber
Ohio State University
Public Health

Mary Applegate
Ohio Department of Medicaid

Anirudh Ruhil
Ohio University

Scientific committee

We use cookies

Privacy policy

Required cookies

Advertising and performance cookies

Stata/MP4 Annual License (download)

Scientific committee

We use cookies

Privacy policy

Required cookies

Advertising and performance cookies