Home  /  Users Group meetings  /  2016 Oceania

The 2016 Oceania Stata Users Group meeting was September 29–30, but you can still interact with the user community even after the meeting and learn more about the presentations shared.

Keynote 1: Publication-quality tables in Stata using tabout
Abstract: I present a comprehensive overview of my tabout module, a Stata ado-program for the batch production of publication-quality tables. I explain the philosophy behind the program, touching on issues of aesthetics, functionality, and reproducible research. I demonstrate the use of tabout to show how easy it is to produce publication-quality multidimensional tables in a number of different formats and styles. tabout does not cover estimation tables. Extending tabout by incorporating more advanced Stata features—such as macros and loops—is also explained, and Stata users are encouraged to extend their skills in this area.

In the final part of the presentation, I will provide an overview of some forthcoming changes in tabout. These incorporate a number of new advanced features, as well as some long overdue enhancements—such as removing unwanted columns. Many of these new features are designed to make tabout more efficient and flexible. These include the use of configuration files, where users can save customized sets of tabout options in files that can be loaded when tabout runs. Better integration with word processors, such as Microsoft Word, will also be incorporated into the new version of tabout. This will allow users to streamline their exporting of tabout output to their word processor and prescribe the formatting of that output. While word processors will never be as versatile as LaTeX, some of the efficiencies of the latter can be realized within a word-processor environment, and my presentation of the new version of tabout will illustrate this. I will conclude by inviting existing users of tabout to provide feedback on their use of the program and to suggest enhancements they would like to see in future versions.
Additional information
Ian Watson
Macquarie University and SPRC, UNSW
Keynote 2: Dealing with endogeneity using Stata
Abstract: Stata has multiple estimators that account for endogeneity. I will briefly discuss these estimators and their assumptions. However, my main focus will be to talk about estimators that account for endogeneity that are not in Stata and can be implemented using gsem and gmm.
Additional information
Enrique Pinzón
StataCorp LP
The unweighted sum of squares goodness-of-fit statistic for binary regression
Abstract: The statistic most commonly used to evaluate the adequacy of the logistic regression model is the Hosmer–Lemeshow statistic. The authors proposed a goodness-of-fit test based on partitioning the fitted probabilities into a number of groups and compared observed events with expected events within each group. They showed via simulations that the resulting statistic follows a chi-squared distribution with degrees of freedom approximately equal to the number of groups minus two. The normalized unweighted sum of squares (USOS) test also assesses model adequacy and is based on a statistic originally proposed by Copas. In this talk, I compare the Hosmer–Lemeshow and USOS statistic in binary regression models with the complementary log-log regression, and I describe the usos command that calculates the statistic.
Additional information
Steve Quinn
Flinders University
getpatent: Web-scraping patent data into Stata
Abstract: The getpatent command crawls relevant websites that store patent-related information to store the source code and then uses regular expressions to web-scrape key patent data into Stata, gradually building a database. The database holds observations on official patent application numbers and dates, the granting date, inventors and patent's name, classification codes and patent claims, plus cross-referencing data on the number of patent backward and forward citations.
Additional information
Le Ma
University of Technology Sydney
Using Stata for segmentation
Abstract: Tagging a segmentation solution for large data sets is problematic when the segmentation is built on soft variables such as attitudes, interests and opinions, because the tagging variables are usually demographics. We examine an alternative approach to increasing the tagging success of soft variable segmentations for large data sets.
Con Menictas
University of Newcastle
Meta-analysis of self-control study: Methods and associated application of METAN
Abstract: Self-control designs are necessitated in situations where there is a desire to assess the effectiveness of an intervention in a small-study population. One such situation is the assessment of a treatment modality called abdominal functional electrical stimulation and whether or not its application results in improved respiratory function in patients suffering from paralysis. Further, meta-analysis of studies applying a self-control study design with repeat measures require adaptation of established methods in order to perform scientifically sound analysis. In this study, we applied a methodology using a specific adaptation of METAN to carry out this complex statistical analysis.

Studies that met inclusion criteria were classified into two broad categories: acute and chronic. Acute studies compared respiratory function prior to and during abdominal functional electronic stimulation (FES). Chronic studies measured the chronic effect of abdominal (FES) training. For both acute and chronic studies, analyses were carried out using either fixed-effects models, using the inverse of the variance (IV) approach, or random-effects models, using the DerSimonian and Laird (D-L) approach. Model choice was determined by the between-study heterogeneity of pooled results, using the I2 statistic. Becasuse of differences in baseline function between studies, estimates of effect were made using the standardized mean difference (SMD), applying Glass's △. This method is preferred where the intervention may potentially alter observed variability and is less susceptible to small-sample bias than other SMD techniques. Multiple models were applied to compare time points in the self-control chronic studies, with similar analysis applied to RCTs at equal time points. A descriptive approach was used to analyze trends observed in the chronic studies, with data normalized based on minimum within-study values for each measure of respiratory function. Publication bias was assessed using the Begg and Mazumdar test and the Eggar approach. All statistical analyses were carried out using Stata 14.

This methodology was successfully applied and is in press (McCaughey et al. Abdominal functional electrical stimulation to improve respiratory function after spinal cord injury: A systematic review and meta-analysis. Spinal Cord [accepted 2015]). The methodology, applying computational methods enabled by Stata represents an important approach to the meta-analysis of self-control study designs.
Additional information
Robert Borotkanics
Macquarie University
Keynote 3: Workflow for data visualization: Bringing structure to graph syntax
Abstract: Stata boasts an impressive graphics engine with an extensive suite of visualization capabilities. The challenging aspect of this richness is its overwhelming syntax. The workflow for data visualization brings structure to the vast syntax by organizing graph code consistent with graphics theory.
Demetris Christodoulou
MEAFA, The University of Sydney

Friday, September 30

Keynote 4: Dynamic documents in Stata
Abstract: Do you suffer from the tedium of moving statistical results by hand from Stata into your research documents or reports? Have you ever had the nightmare of updating a document because of changes to your analysis only to find that you missed some results? Have you ever dreamed of automating production of otherwise brainnumbing standarized reports? If so, you need dynamic documents. Dynamic documents get their name from their ability to update their statistical results when they are created, ensuring complete reproducibility and mimimal maintenance. In the world of Stata, there are quite a few user-written packages for creating dynamic documents, both from within Stata and from within other applications that call back to Stata. In this talk, I'll briefly demonstrate a few different packages, each with their own strengths. You can then choose your package, get more done, and sleep more easily at night.
Additional information
Bill Rising
StataCorp LP
Socio-economic factors influencing productivity among cassava farmers in East Africa
Abstract: Cassava is the second most important food crop in Africa after maize. It is a major staple crop for more than 200 million people in East and Central Africa, most of them living in poverty in rural areas. However, its production is undermined by several factors, particularly the problem of emerging and endemic pests and diseases. We conducted a comprehensive socio-economic study covering Uganda, Tanzania, and Malawi to determine the status of cassava production with the following specific objectives and research questions:
  • What is the present status of cassava production and productivity?
  • How efficient are cassava producers?
  • What is the current adoption rate of improved cassava production technologies?
  • What is the economic impact of B. tabaci complex on smallholder farmers?
Primary data for this study were collected from cassava farmers in Uganda, Tanzania, and Malawi using a pretested survey questionnaire that was orally administered to individual farmers. A total of 800 respondents were selected and interviewed using a multistage random sampling technique. Using Stata, I analyzed the data with a stochastic frontier production model to evaluate the costs, returns, and productivity of cassava farmers in this region. Here I present some of the preliminary results, discuss the implications, and discuss further work required.
Paul Mwebaze
optaspect: Heuristic rules for finding the optimal aspect ratio in a two-variable line plot
Abstract: Line plots encode a series of slopes from adjoining coordinates and aim to reveal suggestive patterns in the sequential rates of change. The judged prevalence of patterns in the bivariate series and the degree of steepness in the rates of change are largely determined by the choice of aspect ratio that is imposed on the line plot. Choosing an appropriate aspect ratio is key in designing informative line plots. The command optaspect calculates the optimal aspect ratio in a two-variable line graph using a number of heuristic criteria.
Demetris Christodoulou
MEAFA, The University of Sydney
Keynote 5: Estimating contextual effects in social science. Multilevel modeling in Stata
Abstract: An increasing number of social sciences are now paying much closer attention to the effect of context on behavior: how the characteristics of the neighborhood moderate the behavior of residents, for example, or the degree to which characteristics of the workplace condition job satisfaction. The classic application in the social sciences is how the performance of pupils is moderated by characteristics of their class and their school. In each case, level 1 units of analysis (usually individuals) are nested within level 2 or level 3 categories.

In each of the above examples, individuals are clustered either spatially or organizationally (or both). Multilevel modeling is now a standard way of addressing not only the need to recognize the lack of statistical independence that joint membership of given contexts usually brings but also the relationship that context plays theoretically.

My presentation will introduce the capabilities of two commands, mixed-effects linear regression (mixed) and mixed-effects binary regression (melogit). Special attention will be paid to postestimation and the graphical representation of intercept and slope effects, including the use of margins. I will reflect on how much additional information about specific behaviors I have learned by applying these applications in Stata 14 in my home discipline of human geography.
Additional information
Philip Morrison
Victoria University of Wellington
Panel discussion: Teaching Stata to students taking their first steps in methodological training at the university level
Abstract: In this presentation we talk about the challenges of teaching with Stata students from nonscience (and science) backgrounds who are taking first steps in their methodological training at the university level. We are newcomers to Stata as a teaching tool, although we have used it for years for our research.

In social sciences, such as sociology or criminology, a typical introductory course covers the rudiments of statistical theory and analytical methods ranging from cross-tabulations through Pearson Product-Moment correlations to ordinary least-square regressions.

Stata offers simple command language to execute analyses needed to generate the relevant tables, but the output for these procedures is not easy to control in Stata. More advanced users of Stata employ user-written procedures such as tabout or estout to produce publication-quality tables.

However, for our students, these procedures are too complex to use. Or so we believe at the moment, having perused the standard documentation and examples for these procedures. We would like to start a conversation about the best ways of creating publication-quality tables easily using Stata output. In our experience, even the standard "right-click" and copy table solution often does not work in practice as it should in theory. We start the conversation by showing three examples of tables we need to easily generate in Stata.
Additional information
Panelist: Joanna Sikora
Australian National University
Panelist: Philip Morrison
Victoria University of Wellington
Panelist: Bill Rising
table1: A program to create a customizable table of summary statistics
Abstract: table1 is a Stata ado-program that produces one- and two-way tables of summary statistics for a list of numeric variables. The rows of the table are formed from the list of specified variables. If no by-variable is specified, the table has only one column of results. If a by-variable is specified, the table has a column of results for each level of the by-variable, with an optional additional totals column. Unlike other Stata tabulation commands (such as tabulate, table, and tabstat), the row variables can be a mixture of continuous variables (summarized by mean, standard deviation, etc.) and categorical variables (summarized by percentages and frequencies).

Additional features include (i) several different options for displaying missing and non-missing counts; (ii) considerable fexibility in the way the results are displayed, in particular, the summary statistics and their possible different presentation for each row variable; (iii) results being restricted to subgroups of the data for individual row variables; (iv) the contents of the table being saved as a Stata data file or text file or exported to Excel. The motivation for table1 is the descriptive table commonly seen in health research publications in which the baseline characteristics of two or more groups are compared. This descriptive table usually has only one column for each group, generally with at least two summary statistics in each column (for example, mean and standard deviation for continuous variables or percentage and frequency for categorical variables). The output of table1 therefore differs from that of tabout in that there is only a single column for each group. The aim of table1 is to assist with reproducible research by enabling creation of a table whose contents can be used unchanged in publications.
Additional information
Susan Donath
Murdoch Children’s Research Institute / The University of Melbourne
The development of a risk index for depression using Stata's gsem with complex survey data
Abstract: Depression is a common mental illness worldwide. The World Health Organization (WHO) estimates that 350 million people of all ages suffer from depression globally. This illness affects a person's well-being, ability to work, and social interactions. However, many suffer from undiagnosed depression. The aim of this analysis was to develop a risk index for depression using a well-known US population-based sample.

Depression was measured using a self-report diagnostic and dichotomized into those with and without depression. A number of generalized structural equation models (GSEM) using Stata 14 were developed with depression as the outcome to form a final path model for the index. SEM models utilized a set of statistical techniques to measure and analyze relationships between a set of observed biomarkers, lifestyle and medical symptom indicators (path analysis), and a latent diet variable (confirmatory factor analysis) with depression. Linear causal relationships among variables were examined while simultaneously accounting for measurement error.

Using Stata's gsem command with the complex multistage survey sample meant the point estimates, standard errors, and tests were adjusted accordingly. The final model consisted of more than one dependent variable with multiple direct and indirect effects. The model was tested across certain key demographic groups to ensure configurable invariance.
Joanna Dipnall
Deakin University


Registration is closed.

Academics/Professionals Price
SUGM (Stata User Group Meeting) AUD $400
Workshop AUD $250
SUGM + Workshop AUD $600
Students Price
SUGM (Stata User Group Meeting) AUD $300
Workshop AUD $200
SUGM + Workshop AUD $450

Stata Users Group Workshops

There will be Stata Users Groups workshops the preceding day, September 28. Choose from one of the two topics below (lecture plus hands-on workshops).

Workshop Topic #1: Introduction to Causal Inference in Stata

Introductory training via interactive lecturing and practical exercises, covering the basics of causal inference, including propensity scores; marginal structural models (MSMs); causal mediation analysis, and G-estimation. This course assumes a basic knowledge of how to operate Stata. Participants should have completed a first course in statistics for nonspecialists and at least be familiar with multivariable regression models.

Trainer: Lyle Gurrin

Associate Professor Lyle C. Gurrin is a teaching and research academic in biostatistics at the Melbourne School of Population and Global Health, which he joined in 2003. Prior to that, he held senior biostatistician positions in Perth at large public hospitals and associated medical research institutes devoted to women’s and children’s health. He is a Chief or Principal Investigator on several large, international, multidisciplinary studies of health and disease in both early life (infant food allergy, childhood adversity and well-being) and later years (hereditary haemochromatosis and men’s health). He promotes the sound practice of statistical reasoning by teaching short courses and classes of postgraduate students and has methodological inte rests in the analysis of longitudinal and correlated data, and causal inference in observational studies.

Trainer: Jessica Kasza

Dr. Jessica Kasza is a biostatistician in the Department of Epidemiology and Preventive Medicine at Monash University. After completing a Ph.D. in 2010 at the University of Adelaide, she spent time at the University of Copenhagen before returning to the University of Adelaide. She has been at Monash University since April 2013. Her research interests include causal inference methodology for the comparison of treatments, and methodology for the comparison of the performance of health care providers. She has a strong interest in the translation and dissemination of complex statistical methodology.

Workshop Topic #2: Bayesian Analysis Using Stata

Bayesian analysis provides a theoretically more intuitive approach to statistical inference and model selection and provides practical computational advantages in implementing complex statistical models. This course presents a basic overview of Bayesian statistics and its implementation in Stata. Lectures will cover an introduction to basic Bayesian models (one parameter and normal models), Bayesian implementation of linear and generalized linear models, and a few examples of complex extensions (including change point models, variable selection, multivariate and multilevel regression, measurement models and structural equations, latent class and mixture models, etc.). Labs will focus on the implementation of these methods with the new Bayesian commands introduced in Stata 14 and include coverage of available user-written commands, examples of direct implementation in Mata, and analysis of Bayesian simulation output produced from other programs.

Trainer: Shawn Treier

Shawn Treier is a lecturer at the School of Politics and International Relations at the Australian National University and received his Ph.D. from Stanford University. His research involves the application of Bayesian measurement models to the study of political institutions, political behavior and public opinion, and the measurement of democracy.

His work has appeared in the American Journal of Political Science, Political Analysis, Journal of Politics, Public Opinion Quarterly, American Politics Research, and Legislative Studies Quarterly.


Scientific committee

Demetris Christodoulou
University of Sydney

Rob Herbert
Neuroscience Research Australia

Logistics organizer

The logistics organizer for the 2016 Oceania Stata Users Group meeting is Survey Design and Analysis Services Pty Ltd, the distributor of Stata in Australia and New Zealand.

View the proceedings of previous Stata Users Group meetings.