11th UK Stata Users Group meeting: Abstracts
Tuesday, 17 May 2005
Roger Newson
Department of Public Health Sciences, King's College London
Abstract
Confidence intervals may be presented as publication-ready tableates and
confidence intervals. It inputs a dataset (or resultsset) with one observation
per parameter and variables containing estimates, lower and upper confidence
limits, and a fourth variable, against which the confidence intervals are
plotted. This resultsset can be used for producing both plots and tables, and
may be generated using a spreadsheet or using statsby, postfile
or the unofficial Stata parmest package. Currently, eclplot
offers 7 plot types for the estimates and 8 plot types for the confidence
intervals, each corresponding to a graph twoway subcommand. These plot
types can be combined to produce 56 combined plot types, some of which are
more useful than others, and all of which can be either horizontal or
vertical. eclplot has a plot() option, allowing the user to
superimpose other plots to add features such as stars for p-values.
eclplot can be used either by typing a command, which may have multiple
lines and sub-suboptions, or by using a dialog, which generates the command
for users not fluent in the Stata graphics language.
Additional information
newson_ohp1.pdf
newson_pres1.zip (related software)
Patrick Royston
MRC Clinical Trials Unit, London
Abstract
The publication of Royston (2004)'s Stata implementation of the MICE
method for multiple imputation of missing values has stimulated much
interest, comment and further development of the software. In this talk
I will describe enhancements of what used to be called mvis.ado and is
now known as mice.ado. The main changes are greatly increased
flexibility in the specification of the prediction equations for
individual variables, better handling of ordered and nominal categoric
variables, and support for so-called passive imputation in which derived
variables are updated from primary variables. All of these features
reflect van Buuren's implementation of MICE on a different statistical
platform. I will demonstrate their use by an example with real data. An
article on the topic is in preparation (Royston 2005).
References
Royston, P. 2004. Multiple imputation of missing values.
Stata Journal 4:
227–241.
Royston, P. 2005. MICE for multiple imputation of missing values:
extension of mvis. Stata Journal, in preparation.
Additional information
royston.ppt
Ben Jann
ETH Zurich
Abstract
Postestimation processing and formatting of regression estimates for input
into document tables are tasks that many of us have to do. However,
processing results by hand can be laborious, and is vulnerable to error. There
are therefore many benefits to automation of these tasks while at the same
time retaining user flexibility in terms of output format. The estout
package meets these needs.
estout assembles a table of coefficients, "significance stars",
summary statistics, standard errors, t/z statistics, p-values,
confidence intervals, and other statistics calculated for up to twenty
models previously fitted and stored by estimates store. It then writes
the table to the Stata log and/or to a text file. The estimates are
formatted optionally in several styles: html, LaTeX, or tab-delimited
(for input into MS Excel or Word). There are a large number of options
regarding which output is formatted and how. This talk will take users
through a range of examples, from relatively basic simple applications
to complex ones.
Additional information
jann.zip (related software)
Karl Taylor
University of Leicester
Abstract
This talk will discuss the use of STATA version 8 for teaching, in the context
of working with large survey data sets. The range of estimation techniques
discussed will include binary response models, discrete choice models,
censored dependent variables and sample selection — all in applied
economic contexts. In particular, I will describe some of the problems, as
well as the benefits, I've encountered with using Stata in the context of the
above frameworks. For instance, the issue of whether having a "windows driven
menu system" detracts from one of the key benefits of Stata for learning
— that is having a structured approach to modeling via do files; and
also issues of Stata's speed in terms of gaining marginal effects in discrete
choice models in comparison to other available econometric software.
Additional information
taylor_teach.pdf
André Charlett
joint with Neville Verlander
Communicable Disease Surveillance Centre, London
Abstract
The creation and testing of interaction terms in regression models can be very
cumbersome, even in Stata 8. We propose a simple wrapping command,
fitint, that fits any generalized linear model and tests any twoway
interactions, as well as all main effects. There is no need to use xi
because categorical variables are identified with the option factor.
Appropriate tests are chosen depending upon the fitted model.
Additional information
charlet-fitint.pdf
fitint.ado
fitint.hlp
Giovanni Bruno
Istituto di Economia Politica, Università Bocconi
Abstract
The Monte Carlo strategy by McLeod and Hipel (Water Resources Research, 1978),
originally thought for time series data, has been adapted to dynamic panel
data models by Kiviet (1995). This procedure is more efficient than the
traditional approaches in that it generates start-up values according to the
data generation process, so it avoids wasting random numbers in the generation
of initial conditions and also small sample non-stationarity problems. This
presentation discusses my Stata implementation of Kiviet's (Journal of
Econometrics, 1995) procedure, as used in Bruno (2005) and (2004) to evaluate
the finite sample properties of theoretical approximations for the LSDV bias
(Bruno (Economics Letters 2005; UKSUG 2004)) and of the bias-corrected LSDV
estimator (Bruno (2004); Italian SUG 2004) in the presence of unbalanced
designs.
Additional information
bruno.pdf
bruno.zip (related software)
Stephen Jenkins
University of Essex
Abstract
Martin Biewen and I have derived the sampling variances of Generalized
Entropy and Atkinson indices for the case they are estimated from
survey data with a complex design. (Our paper is downloadable from
http://www.iser.essex.ac.uk/pubs/workpaps/pdf/2003-11.pdf.) This talk
illustrates how the indices may calculated in Stata, using our commands
svyatk and svygei. The empirical illustrations compare income
inequality in Britain and Germany.
Additional information
jenkins.pdf
jenkins.zip (related software)
Axel Heitmüeller
London Business School
Abstract
In the presence of more readily available panel data the question arises
whether standard decomposition techniques can be applied in the same
spirit as in cross-section data. Monte Carlo studies show that employing
a simple decomposition into explained and unexplained parts in the
presence of time-invariant regressors using fixed-effects estimation
will yield biased and inconsistent results. It is shown that this is not
the case if the means in time-invariant variables of the respective
groups are equal. Hence, it is argued that standard decomposition
techniques are only transferable to fixed-effects estimation under
certain stricter assumptions which are testable. This talk will outline
these arguments, and discuss how the various decomposition techniques
can be implemented in Stata.
Additional information
heitmueller.pdf
Verity Allan
St Cross College, University of Oxford
with Teresa F. Allan
City University, London
Abstract
I have been investigating various statistical methods of looking for poetical
cadences (sentence ends which have rhythm) in Latin prose. Stata was used as
my primary software for performing my own analysis, and for checking the
analysis of previous scholars. Several methods for determining rhythmicity
have been proposed over the last twenty years; I have evaluated the use of
some of these and used others to analyze a particular text by the Venerable
Bede (a Northumbrian monk, born in c.672, who wrote Biblical commentaries in
Latin, amongst other things). The research method involved Chi-squared tests
performed against control texts (examples of Latin prose selected for the type
of their cadences). The analysis using Stata provided me with the necessary
figures for performing adjustments to avoid overtesting (the Chi-squared test
was performed many times on the same material). I found that, when compared to
control texts, Bede was significantly more likely than the control texts to
use rhythmical cadences, but was equally likely to use metrical cadences. I
concluded that Bede used rhythmical cadences in his prose, but may not have
used metrical cadences in his prose.
Additional information
allan.pdf
Abstract
Margarethe Theseira
joint with Leticia Veruete–McKay
GLA Economics, London
We estimate the gender pay gap in London and the UK based on Labour
Force Survey data 2002/03. Our approach decomposes the mean average
wages of men and women into two parts:
- Differences in individual and job characteristics between men
and women (such as age, number of children, qualification, ethnicity,
region of residence, working in the public or private sector, working
part-time or full-time, industry, occupation and size of company)
- Unequal treatment and/or unexplained factors.
Stata enables us to easily implement a cross sectional regression for a
large household dataset and derive the distribution of wages for men and
women. Results from our work indicate that differences in individual and
job characteristics account for most of the gender pay gap. The impact
of direct unequal treatment appears to be slightly lower in London than
outside, reducing London women's wages by around four per cent compared
to six percent outside London.
Our wage distribution analysis indicates that for London part-time
workers of both sexes are paid less than full-time workers. Among
full-time workers, the lower-paid workers have virtually no difference
in pay between men and women however the gender pay gap widens further
up the wage distribution to 24 per cent for the top decile.
Additional information
theseira.pdf
Alfonso Miranda
Keele University
Abstract
Studying behavior in economics, sociology, and statistics
often involves fitting a model in which the outcome is an ordinal
response which is only observed for a subsample of subjects. (For
example, questions about health satisfaction in a survey might be asked
only of respondents who have a particular health condition.) In this
situation, estimation of the ordinal response model without taking
account of this "sample selection" effect, using e.g. ologit or
oprobit, may lead to biased parameter estimates. (In the earlier
example, unobserved factors that increase the chances of having the
health condition may be correlated with the unobserved factors that
affect health satisfaction.) The program gllamm can be used to
estimate ordinal response models accounting for sample selection, by ML.
This paper describes a "wrapper" program, osm, that calls gllamm to
fit the model. It accepts data in a simple structure, has a
straightforward syntax and, moreover, reports output in a manner that is
easily interpretable. One important feature of osm is that the
log-likelihood can be evaluated using adaptive quadrature.
Additional information
miranda-osm.pdf
Roberto Gutierrez
StataCorp
Abstract
Included with Stata version 9 is the new command xtmixed, for fitting
linear mixed models. Mixed models containing both fixed and random effects.
The fixed effects are analagous to standard regression coefficients and are
estimated directly. The random effects are not directly estimated but are
summarized according to the unique elements of their respective
variance–covariance matrices, known as variance components.
xtmixed syntax is summarized and demonstrated using several examples,
postestimation tasks are desribed, and future areas of Stata growth in the
field of mixed models in general is discussed.
Additional information
gutierrez.pdf
Felicity Clemens
London School of Hygiene and Tropical Medicine
Abstract
The cleaning and verification process of many different types of
datasets often involves considering similar problems. This presentation
will give a very brief simple overview of three useful processes and
their associated Stata commands:
- Finding, counting and removing duplicated data and other multiple
entries;
- Summing individual-level entries to give an overall score per
individual — when to treat missing data as 0;
- Recap of merging data and uses of the merge command
The presentation will outline the difficulties that are frequently
encountered in these three situations and show how they can be addressed
using the common Stata commands of count, rsum, sum and merge/append
respectively.
Additional information
datacleaning_public.ppt
cleans.zip (datasets and do-file)
Tim Collier
London School of Hygiene and Tropical Medicine
Abstract
Stata 8 graphics have changed out of all recognition from that available
in earlier versions. It was not just that a whole new array of options
and sub-options were introduced, but the graph syntax itself completely
changed. Just trying to produce a simple plot of x against y using Stata
7 syntax (graph x y) produced bewildering error messages e.g.
xgraph_g.new y: class member function not found r(4023) and the like. If
you did succeed in working out the new syntax (graph twoway scatter x y)
it then seemed to take forever-and-a-day for that oh-so-very-simple
graph to appear. Even if you were the patient type prepared to wait for
that graph to appear numerous bugs further tested your resolve to
persevere. Many gave up at this point and chose to use the graph7 option
that enabled the user to access the old graph commands. Life was too
short!
But, things have moved on and quickened up considerably. Stata 8 does
offer the potential to produce effective publication-standard graphs. A
broad range of graph types are available with the user being able to
control almost every aspect of what will appear. Taking time to learn
the new graph syntax and to explore the options, sub-options and even
sub-sub-options will pay dividends.
The aim of this session is to convince you of the benefit of persevering
with Stata 8 graphs. It will introduce some of the more useful graph
types, in particular the twoway family. These will be used to show how
to build a graph command, to highlight some of the more useful options
available, and to show how to produce an eye-catching and effective end
product.
Additional information
collier_graphics.pdf
collier_graphs.pdf
Andrew Pickles
joint with Milena Falcaro and Bethan Davies
University of Manchester
Abstract
gllamm provides a framework within which many of the more difficult
analyses required for trials and intervention studies may be undertaken.
Treatment effect estimation in the presence of non-compliance can be
undertaken using instrumental variable (IV) methods. We illustrate how
gllamm can be used for IV estimation for the full range of types of
treatment and outcome measures and describe how missing data may be
tackled on an assumption of latent ignorability. Alternative approaches
to account for clustering and analysis of cluster-randomized studies will
also be described.
Quality of life and economic evaluation of outcomes often makes use of
discrete choice and stated preference experiments in which illness
scenarios are assessed. We illustrate how gllamm can be used for the
analysis of data from such studies, whether these are in the common form
of paired comparisons or the more complex case where multiple scenarios
are ranked.
Examples from studies of a school-based smoking intervention, a
re-employment encouragement experiment, a group therapy trial and of
quality-of-life with rheumatoid arthritis will be considered.
Additional information
pickles-gllamm.pdf
C. F. Baum
Boston College
Abstract
This tutorial will discuss a number of elementary Stata programming constructs
and discuss how they may be used to automate and robustify common data
manipulation, estimation and graphics tasks. Those used to the syntax of other
statistical packages or programming languages must adopt a different mindset
when working with Stata to take full advantage of its capabilities. Some of
Stata's most useful commands for handling repetitive tasks: forvalues,
foreach, egen, local and matrix are commonly
underutilized by users unacquainted with their power and ease of use. While
relatively few users may develop ado-files for circulation to the user
community, nearly all will benefit from learning the rudiments of use of the
program, syntax and return statements when they are faced
with the need to perform repetitive analyses. Worked examples making use of
these commands will be presented and discussed in the tutorial.
Additional information
baum.pdf
|
Meetings
Stata Conference
User Group meetings
Proceedings
|