Meeting summary
[minutes by Bianca L. De Stavola]
The seventh UK Stata Users Group meeting attracted about 50 participants to
the Royal Statistical Society in London on 14 and 15 May. The meeting was
organized by Bianca L. De Stavola (London School of Hygiene and Tropical
Medicine) and Stephen Jenkins (University of Essex) with the administrative
support of Timberlake Consultants, who also generously sponsored the speakers.
William Gould and Robert Gutierrez from StataCorp attended, and enlivened,
the meeting. Despite the UK label, the meeting attracted participants from
other countries, e.g., US and Sweden, with the largest non-UK contingent from
Italy.
The meeting was opened by Nick Cox (Durham University). His talk,
"Plotting graded data: a Tukey-ish approach", was as lively and original as
the UK SUG regulars are used to expect from him. Inspired by the recent death
of John Tukey, Nick used plots of cumulative probabilities to compare graded
data observed in different groups of subjects. These plots can be produced by
ordplot using a wide selection of scales, ranging from flog to froot
(!) via the better known logit. Not surprisingly the results offered greater
insights into the data than their tabulation would have revealed. This was
followed by a contribution by Andrew Pickles (University of Manchester). The
features of Census Data available to researchers motivated the topic, "Fitting
log-linear models with ignorable and non-ignorable missing data". The need to
use both individual level data and data aggregated at some higher level within
a log-linear model framework led Andrew and his collaborators to implement the
composite link approach to missing data via first some complex data
reorganization (carried out in makecct) and then an ml-based command
(cctfit).
The next three talks introduced new commands for the st family. Patrick
Royston (MRC Clinical Trials Unit) proposed a command for fitting proportional
hazards and proportional odds models to survival data in "Flexible parametric
alternatives to the Cox model...and more". The command, stpm, allows the
estimation of a non-parametric baseline hazard as well as the relevant hazards
or odds ratios. Since the baseline hazard is specified as a spline function,
plotting it turns out to be easy and informative. Ian White introduced a new
command, strbee, pronounced "strawberry" (despite Ian's drawing,
to many resembling a tomato!). The command allows the user to estimate a
treatment effect in randomised clinical trials when patients cross-over from
their assigned treatment to the alternative one during follow-up. The
method, developed by Ian and his collaborators, is based on work by Robins and
Tsiatis (1991) and applies to accelerated life survival models. A related
method for the analysis of observational studies was presented by Kate
Tilling and Jonathan Sterne for their program, stgest, the name
standing for G-estimation. It applies to survival data where both the
exposure of interest and the confounder change over time with the latter's
value s possibly on the causal path of the former. G-estimation requires at
least three time-points where data are collected and uses those before the
last, e.g., the first two when three are available to mimic the relative
effect of being or not-being exposed for every exposed subject.
A general contribution to the analysis of epidemiological data was given by
Michael Hills who showed a menu interface, efmenu, to the effects
commands he and David Clayton presented at last year's SUG. Their command
translates both input and output for generalized linear models, as used in
epidemiology, into a "classical" framework where exposures and confounders
are declared before the analysis is carried out. The menu makes this
transition extremely smooth, although the programming involved apparently was
not. Paul Seed concluded the packed morning session with a clear description
of his new command for xt-type data, xtgraph. This allows producing
summary graphs of the observed data using, for example, geometric means or
medians, together with their values as predicted by any of the regression
commands.
The afternoon session started in the same way as the morning one, that is with
a presentation by Nick Cox. This time the topic was "Triangular plots" which
can be produced with triplot. Such plots can be used to represent the
distribution of three inter-related variables, for example, the percentages of
workforce employed in agriculture, industry and services, over another
dimension, e.g., time or region. It was then Sophia Rabe-Hesketh's turn to
describe some of the new extensions to gllamm6, the generalized linear
latent and mixture program she published with Andrew Pickles and C. Taylor in
STB-53 (sg129). The new version of the program is called simply gllamm.
To illustrate the extension that involves modelling multilevel nominal data
and rankings, Sophia used British election data from 1987 and 1992 while the
Diet data from the Stata manual was used to describe models with latent
variables (true dietary intake) in the pathway between explanatory
(occupation) and outcome (coronary heart disease) variables.
Another example of a menu-driven command was given by Abdel Babiker who
developed it with Patrick Royston. Its use is for sample size calculations in
randomised clinical trials where more than two groups may be compared in terms
of survival. The menu is invoked by the ssmenu command. This allows the
user to select a series of complex options (in the Stata sense) for the
command calcssi. Losses to follow-up, staggered patient entry and
non-proportional hazards are some of its more notable features. The afternoon
was concluded by Bill Gould (StataCorp President) who entertained the
audience with glimpses of the future (Stata-wise only, unfortunately) while
the audience responded with a short list of grumbles. The serious part of
the day over, most participants followed tradition and visited first the local
pub and then the "Last Days of the Raj" in Covent Garden. Here the
conversation ranged from "what is an Essex girl" to the future of British
politics but ended when Bill Gould started to sing (this is nearly true).
The second day started with an interesting talk by Mohamed Ali who presented
mtable (twinned to ltable), a program for computing cumulative
incidence rates (and their SE) in the presence of competing risks. Mohamed
stressed how the method implemented in his program, unlike the use of the
complement of Kaplan-Meier curves, gives the correct estimates. A talk with an
economic flavour then followed, despite the topic being still centred on
survival data. Stephen Jenkins spoke about his program — spsurv
— that estimates a discrete-time split population ("cure") survival
model. In the standard survival model each subject is assumed to experience
the relevant event sometime; in the split population model, an estimable
fraction is allowed to experience the event. (In a biostatistics context this
is the proportion of subjects under treatment who are `cured'.) Another
economist, Kit Baum, then addressed the problems arising from managing large
panel data sets consisting of pair-wise information on economic trade between
18 countries, spanning over many time points. The task appeared to be
horrendous but Stata made Kit's life easy or, at least, that is what he
claimed! Hundreds of non-linear regression models were then fitted for each
country's trade pattern with every other country, and the results
post-processed and summarised graphically.
Roger Newson took the audience back to medical applications with a
step-by-step presentation of how splines can be parameterised and then fitted
in a format that makes them more understandable by non-mathematicians. This is
achieved via his program frencurv. With Barbara Sianesi we
enthusiastically went back to an economic application. This concerned
"propensity score matching" to be used for dealing with non-random allocation
of individuals to a "treatment" (e.g., a training programme) and the
estimation of its effect on an "outcome" (e.g., earnings). The method mirrors
applications in biostatistics but the command, match, is tailored to
econometricians.
The morning concluded with one more presentation on survival analysis and one
on ordinal outcomes. The first talk had an economic motivation and the second
a medical one, but both can be widely applied. The first was by Ken Simons
who introduced sthaz for fitting smoothed hazards to survival data, via
kernel density estimation. Confidence intervals can be computed while
extensions to allow variable bandwidth smoothing are in the pipeline. The
last presentation of the morning was by Mark Lunt who very lucidly reviewed
the most used methods for the analysis of ordinal data. To this list Mark
added the stereotype model, which is nested within the multinomial model and
for which a program soreg (stereotype ordinal regression) is available.
After lunch all participants reconvened to listen to Bobby Gutierrez
(StataCorp) who reviewed current features and future developments of frailty
survival models in Stata. At a very fast pace, which reflected the speaker's
enthusiasm for the topic, Bobby explained the conceptual difference between
frailty and shared frailty models and discussed the effects of ignoring either
of them when fitting parametric survival models. Extensions to Cox regression
(shared) frailty models are still being developed.
|
Meetings
Stata Conference
User Group meetings
Proceedings
|