»  Home »  Users Group meetings »  2018 Canada

The 2018 Canadian Stata Conference takes place on July 27 at Simon Fraser University in the Morris J. Wosk Centre for Dialogue.

The Conference will provide Stata users the opportunity to exchange ideas, experiences, and information on new applications of the software. Representatives from StataCorp—Jeff Pitblado, Executive Director of Statistical Software and Bill Rising, Director of Educational Services—will attend, and there will be the usual "Wishes and grumbles" session, at which you may air your thoughts to Stata developers. Everybody interested in using Stata—regardless of experience level—is welcome.

## Program: Friday, July 27

8:00–9:00 Registration & breakfast
9:00–9:05 Welcome
##### Complex Survey
9:05–9:25
Approaches to imputing missing data in complex survey data
Abstract: Complex survey data collected by government agencies are both expensive and valuable. Producing a complete dataset is important, but missing data in complex survey data pose some unique challenges. Commonly used statistical software packages such as Stata, SAS, and SUDAAN each have a procedure to impute the missing data. However, unlike the procedures for describing and analyzing complex survey data, the procedures implemented by these three software programs are fundamentally different. The three approaches will be described, and an example will show the similarities and differences. The recent developments in this area of the Census Bureau will also be discussed.
Christine Wells
UCLA
9:25–10:15
Calibrating survey weights in Stata
Abstract: Calibration is a method for adjusting the sampling weights and often used to account for nonresponse and underrepresented groups in the population. Another benefit of calibration is smaller variance estimates compared with estimates using unadjusted weights. Stata implements two methods for calibration: the raking-ratio method and the generalized regression method. Stata supports calibration for the estimation of totals, ratios, and regression models. Calibration is also supported by each survey variance-estimation method implemented in Stata. In this presentation, I will show how to use calibration in survey data analysis using Stata.
StataCorp
10:15–10:30 Break
##### Causal Inference, Endogeneity, and Data Science
10:30–11:00
Multiple fractional response with endogenous binary explanatory variables: An application to consumers
Abstract: Contactless credit cards are a payment innovation combining the speed and convenience of paying cash with desirable features of credit card payments, for example, enhanced record keeping and the ability to earn rewards. There have been several attempts to measure the impact that contactless credit card adoption has on consumers' use of cash for making point-of-sale transactions. Fung, Huynh, and Sabetti (2014) use data from the Bank of Canada's 2009 Methods-of-Payment survey to estimate that contactless adoption results in a decline of 10% for the volume share of purchases made with cash. This analysis was undertaken when use and acceptance of contactless payment was still nascent. Chen, Felt, and Huynh (2017), by contrast, find no impact on the cash share. Their work exploited panel-data structure to better control for unobserved heterogeneity across consumers. Part of the difficulty in measuring the impact of contactless adoption on cash usage is the obvious endogeneity issue: it is unclear whether adoption of contactless technology lowers cash usage or whether cash intensive consumers are less likely to adopt contactless, perhaps for other reasons, for example, a preference for anonymity. Huynh, Schmidt-Dengler, and Stix (2014) show that merchant acceptance also plays a crucial role in cash usage, further complicating the causality issue as contactless terminals, while increasing over time, are certainly not ubiquitous. Recent work by Nam (2016) using an approach developed by Woolridge (2014) allows us to address this problem and provide a more robust model of payment choice and contactless adoption. We utilize data from the Bank of Canada's 2013 Methods-of-Payment survey. The survey included a three-day payments diary that tracks respondents' purchases over the course of three days; this allows us to calculate cash, debit, and credit shares. These shares have an obvious dependence — an increase in the cash share will necessarily lead to a decrease in either debit or credit because the shares must add to one. Nam's estimator allows us to model this effect while simultaneously accounting for the endogenous contactless adoption decision, hence providing more reliable estimates of the impact on cash. We implement the estimator in Stata and provide a method for bootstrapping error estimates.

References:

Chen, H., M. H. Felt, and K. P. Huynh. 2017. Retail payment innovations and cash usage: Accounting for attrition using refreshment samples. Journal of the Royal Statistical Society Series A, 180, 503–530.

Fung, B,. K. P. Huynh, and L. Sabetti. 2014. The impact of retail payment innovations on cash usage. Journal of Financial Market Infrastructure 12: 1–29.

Huynh, K. P., P. Schmidt-Dengler, and H. Stix. 2014. The role of card acceptance in the transaction demand for money. Bank of Canada Staff Working Paper 2014-44.

Nam. S. 2016. Multiple fractional response variables with a binary endogenous explanatory variable. Mimeo.

Woolridge, J. M. 2014. Quasi-maximum likelihood estimation and testing for nonlinear models with endogenous explanatory variables. Journal of Econometrics 182: 226–234.

Kim Huynh
11:00–11:20
Bounding a causal effect using relative correlation restrictions
Abstract: Causal inference generally relies on strong assumptions of exogeneity and selection on observables. Applied researchers regularly make these assumptions but are often concerned that their results may be sensitive to small violations of them. This presentation will describe one approach to this problem: inferring the correlation between the treatment and unobservables from the observed correlation between the treatment and the observable/control variables. I will also describe implementation of this method in Stata and some practical considerations in its use.
Brian Krauth
Simon Fraser University
11:20–11:40
Learning about selection: An improved correction procedure
Abstract: Machine learning techniques are utilized in this presentation to improve upon the selection correction procedure of Dahl (2002). Dahl's nonparametric method is widely used in the empirical economics literature to control for selection bias; however, it relies on a strong identification assumption. This single index sufficiency assumption (SISA) imposes restrictions on the error terms of the selection equation that are likely violated in many applications. This contribution establishes a modified correction procedure that uses variable selection techniques to relax this assumption. Identification in this alternative procedure relies on a restriction that is data driven and is a relaxation of the SISA. Variable selection is performed by employing the post-double-lasso estimator of Belloni, Chernozhukov, and Hansen (2014). This is implemented in Stata using lassopack, a set of community-contributed commands by Ahrens, Hansen, and Schaffer. I perform a numerical experiment that establishes that this method is preferable to traditional correction procedures in all cases, except where researchers have strong a priori reasons to suspect that the SISA holds. Machine learning methods, combined with the insights of Lee (1983), can therefore be used to control for selection bias, while overcoming the curse of dimensionality, without the imposition of overly strong distributional assumptions.
Iain Snoddy
University of British Columbia
11:40–12:10
A new Stata command for the Random Forest algorithm
Abstract: Random Forest is a statistical machine-learning algorithm for prediction and classification under supervised learning. Our Stata command randomforest implements this algorithm through a plugin to the WEKA library. randomforest is available for Windows/Mac/Linux. We will review the algorithm and illustrate randomforest with two examples: 1) prediction of the election outcomes for individual constituencies of the 2017 British Election Study data and 2) prediction of household income from the 2016 US Consumer Finance Survey data.
Rosie Zou
University of Waterloo
12:10–1:15 Lunch
##### Learning Tools
1:15–2:05
Efficient dynamic documents using Stata
Abstract: Stata 15 includes three new commands for producing dynamic documents: dyndoc, putdocx, and putpdf. These commands have generated much interest in the user community; this has led to a large amount of community-contributed software. In this presentation, I'll give some tips about how to use the commands efficiently both with official Stata software and with some of these community-contributed tools.
Bill Rising
StataCorp
2:05–2:25
Exporting cartography data from Stata to GIS systems
Abstract: geotools is a community-contributed set of tools for exporting data from Stata datasets in ubiquitous ShapeFile and GeoJSON formats. These formats are supported by numerous online and offline GIS systems, including ESRI's ArcView/ArcGIS products, Google API, and other GIS and data-visualization systems. The input data may be coming from own data collection, such as with the use of GPS sensors in the growing segment of CAPI data collection software, or it can be a product of geospatial data analysis in Stata. The produced output can be utilized as layers in composite multilayer maps, as interactive maps, etc. geotools does not require online access or other software to produce its output. In the presentation, I will overview the functionality and options of geotools and establish relations with other community-contributed Stata modules related to GIS capabilities/file formats.
The World Bank
2:25–2:45
Stata for an introductory biostatistics course—Some useful insights
Abstract: I present instructional aids using Stata that I have found useful for an introductory course on biostatistics taught at the University of Toronto. Particularly useful tools include CDF graphs that highlight the fact that treatment effects in logit and other binary response models depend on the variance of the latent underlying continuous variable; animations that show the relationship between hypothesis tests on a parameter value and the corresponding confidence interval; and a slightly generalized form of the power by a simulation Stata program developed by A. H. Feiveson.
Paul Grootendorst
University of Toronto
2:45–3:05
Murtaza Haider
Ryerson University
3:05–3:20 Break
##### Clustering
3:20–3:50
Inference with clustered data
Abstract: This article introduces clusteff, a new Stata command for checking the severity of cluster heterogeneity in cluster–robust analyses. Cluster heterogeneity can cause a size distortion leading to under-rejection of the null hypothesis. Carter, Schnepel, and Steigerwald (2015) develop the effective number of clusters to reflect a reduction in the degrees of freedom, thereby mirroring the distortion caused by assuming homogenous clusters. clusteff generates the effective number of clusters. We provide a decision tree for cluster–robust analysis, demonstrate the use of clusteff, and recommend methods to minimize the size distortion.
Douglas Steigerwald
UC Santa Barbara
3:50–4:10
Fast and wild: Bootstrap inference in Stata using boottest
Abstract: The Stata package boottest implements a wide variety of bootstrap tests, including tests for linear regression models that are robust to one-way or multiway clustering. I explain how these tests work and provide empirical examples. In the one-way case, the program can generate the bootstrap data in two different ways, using the wild bootstrap or the wild cluster bootstrap. In the two-way case, it can do so in four different ways, using the wild bootstrap or three variants of the wild cluster bootstrap. For each method, four different p-values can be calculated to handle all types of one-sided and two-sided tests.
Matthew Webb
Carleton University
4:10–4:45
Wishes and grumbles
StataCorp

## Scientific committee

Leslie-Anne Keown (Chair)
Carleton University

Estie Hudes
University of California–San Francisco

Kim Huynh

Matthias Schonlau
University of Waterloo

Vicki Stagg
Calgary Statistical Support

## Registration and accommodations

Registration

Conference fees Price
Nonstudents $75.00 USD Register Students$30.00 USD   Register
UGM Dinner (optional) \$40.00 USD   Register

The optional users dinner will be at Blue Water Cafe on Friday,
July 27, at 6:00.

Blue Water Cafe
1095 Hamilton Street
Vancouver BC   V6B 5T4
Tel: 604-688-8078

Accommodations

The Vancouver Marriott Pinnacle Downtown Hotel is offering a special rate of 275.00 CAD for Stata Conference attendees. Book your room by July 2 to receive the special rate.

Vancouver Marriott Pinnacle Downtown Hotel
1128 West Hastings Street
Vancouver BC   V6E 4R5
+1-800-207-4150

Venue

Morris J. Wosk Centre for Dialogue
Simon Fraser University
Asia Pacific Hall
580 W. Hastings St.
Vancouver, BC V6B 5K3