Home  /  Resources & support  /  Users Group meetings  /  2011 Australian and New Zealand Stata Users Group meeting

Last updated: 19 September 2011

2011 Australian and New Zealand Users Group meeting

17 September 2011


The University of Notre Dame
Fremantle, Western Australia


Analysis of family case–control studies in Stata

David Muller
Cancer Epidemiology Centre — The Cancer Council of Victoria
The family case–control design—in which families are recruited on the basis of one or more affected members—is becoming an increasingly popular epidemiological tool for estimating both genetic and nongenetic effects. A matched case–control analysis using conditional logistic regression is often applied to estimate the effect of an exposure on disease, but this approach can lead to underestimates of associations if unmeasured familial and genetic effects correlated within family members are ignored. A random-effects conditional logistic regression model has been proposed, which conditions on both family ascertainment and familial random effects. In this talk, I will briefly describe the conditional logistic random-effects model. I will also describe the development of a new Stata command that will estimate the parameters of this model.

Additional information

Using Non-Stata Programs within Stata

Karl Keesman
Survey Design and Analysis
Sometimes you may wish to do something within Stata that Stata currently does not do. One solution is to run another program within Stata. In this presentation, I will show how to send emails from Stata using another program. Specifically, I look at automatically emailing a log file of an analysis when Stata has finished running a do file and also emailing the status of an analysis as it progresses.

I will also show how to merge graphs and log files in Stata 12 for Windows. Stata 12 allows a log file and graphs to be translated into PDF but not into one file and only in the order that they are produced. With the use of a freeware program and some Stata code, I will show how to circumvent this issue.

Additional information

Graphics tricks for models

Bill Rising
Visualizing interactions and response surfaces can be difficult. In this talk, I will show how to do the former by graphing adjusted means and the latter by rolling together contour plots. I will demonstrate this for both linear and nonlinear models.

Additional information

Stata data management in survey tracking studies

Joanna Dipnall
CogNETive Pty Ltd
Stata has strong statistical abilities, being widely used around the world by statisticians in varying disciplines. However, many standard Stata data-management commands can be easily incorporated into the day-to-day management of survey sampling. Stata is currently being used by CogNETive as an integral component in a monthly data-collection study for a major financial institution. Each month, CogNETive performs an online survey to an elite group of financial customers regarding their satisfaction with the introduction of a new online financial system. Stata is used to effectively manage both the front and back ends of the survey process. The merging and managing of the email sampling is performed solely by Stata. Each quarter, the financial institution provides a quarterly transaction file for each customer to be incorporated into the survey research data and analysis. Many data-management issues have arisen over the course of the study (for example, merge conflict), potentially causing significant implications to the results of the study. A discussion of the processes involved, and tips and traps for this style of study will be discussed.

Comparison of the C-statistic with new model discriminators in the prediction of long versus short hospital stay

Richard J. Woodman
Flinders Centre for Epidemiology and Biostatistics, Discipline of General Practice, Flinders University
Campbell H. Thompson
Discipline of Medicine, Adelaide University
Susan W. Kim
Flinders Centre for Epidemiology and Biostatistics, Discipline of General Practice, Flinders University
Paul Hakendorf
Redesigning Care, Flinders Medical Centre, Adelaide
Quantification of the added usefulness of new measures in risk prediction has traditionally relied upon significance tests from regression models and increases in the C-statistic. However, significant model predictors often cause only minor increases in the C-statistic, suggesting limited utility of the new measures in improving risk prediction. More recently, other discriminators have gained popularity amongst researchers. The Integrated Discrimination Improvement index (IDI) measures the difference between the change in the mean predicted risk of an event occurring for those who had the event and the change for those who didn’t have the event. The Net Reclassification Improvement index (NRI) quantifies the percentage of subjects correctly re-classified in terms of risk.

A logistic regression model was developed to predict risk of long from short (<=72 hrs) hospital stay amongst 1,457 general medicine patients. Significant predictors were age, blood pressure (BP), heart rate (HR), respiratory rate (RR), mobility, white blood cell count (WBC), cardiac failure (CF) and the need for supplemental oxygen (SuO2). Using the predicted probabilities for long-stay, we assessed improvements in the C-statistic (ΔC), the IDI (%) and the NRI (%) after the addition of each variable beyond age. The NRI was assessed using predicted probability cutpoints for long-stay of 50% and 57% (that is, the overall prevalence of long-stay patients) and the category-free NRI, which assesses the proportion of patients with improved prediction probabilities according to their eventual outcome.

The C-statistic identified HR (ΔC=0.027, p<0.001), mobility (ΔC=0.024, p<0.001), BP (ΔC=0.01, p=0.002), and WBC (ΔC=0.01, p=0.003) as measures that significantly increased model discrimination. The IDI identified the same measures (HR=4.2%, mobility=3.1%, BP=1.2%, and WBC=1.5%; p<0.001 for each) and additionally RR (0.7%, p<0.001), CF (0.4%, p<0.05), and SuO2 (0.3%, p<0.05). The NRI with a 50% cutpoint identified HR (5.2%, p=0.004), mobility (3.1%, p=0.02), and RR (3.3%, p=0.01), while the NRI with a 57% cutpoint identified mobility (5.1%, p=0.003), RR (2.4%, p=0.02), and SuO2 (2.3%, p=0.006). The category-free NRI identified HR (21.0%, p<0.001), mobility (24.9%, p<0.001), BP (14.6%, p<0.001), WBC (8.3%, p=0.02), and RR (8.4%, p=0.03).

The selection of measures to include for the prediction of long hospital stay differed between model discriminators. The IDI and the category-free NRI were more sensitive discriminators than was the C-statistic, with both identifying RR in addition to HR, mobility, BP, and WBC. The IDI also identified CF and SuO2. Fewer variables were identified by the category-dependent NRI than by the C-statistic, and the selected variables also differed according to the chosen probability cutpoint.

Additional information

Scientific organizers

Kieran McCaul, University of Western Australia

Max Bulsara, The University of Notre Dame–Fremantle

Logistics organizers

Survey Design and Analysis Services Pty Ltd, the official distributor of Stata in Australia and New Zealand.