2005 UK Stata Users Group meeting

Home / Resources & support / Users Group meetings / 2005 UK Stata Users Group meeting

Last updated: 25 May 2005

2005 UK Stata Users Group meeting

17–18 May 2005

Centre for Econometric Analysis
Cass Business School
106 Bunhill Row
London EC1 8TZ
United Kingdom

Materials documenting the meeting

Proceedings

Generalized confidence interval plots using commands or dialogs

Roger Newson
Department of Public Health Sciences, King's College London

Abstract
Confidence intervals may be presented as publication-ready tableates and confidence intervals. It inputs a dataset (or resultsset) with one observation per parameter and variables containing estimates, lower and upper confidence limits, and a fourth variable, against which the confidence intervals are plotted. This resultsset can be used for producing both plots and tables, and may be generated using a spreadsheet or using statsby, postfile or the unofficial Stata parmest package. Currently, eclplot offers 7 plot types for the estimates and 8 plot types for the confidence intervals, each corresponding to a graph twoway subcommand. These plot types can be combined to produce 56 combined plot types, some of which are more useful than others, and all of which can be either horizontal or vertical. eclplot has a plot() option, allowing the user to superimpose other plots to add features such as stars for p-values. eclplot can be used either by typing a command, which may have multiple lines and sub-suboptions, or by using a dialog, which generates the command for users not fluent in the Stata graphics language.

Additional information
newson_ohp1.pdf
newson_pres1.zip (related software)

MICE for multiple imputation of missing values

Patrick Royston
MRC Clinical Trials Unit, London

Abstract
The publication of Royston (2004)'s Stata implementation of the MICE method for multiple imputation of missing values has stimulated much interest, comment and further development of the software. In this talk I will describe enhancements of what used to be called mvis.ado and is now known as mice.ado. The main changes are greatly increased flexibility in the specification of the prediction equations for individual variables, better handling of ordered and nominal categoric variables, and support for so-called passive imputation in which derived variables are updated from primary variables. All of these features reflect van Buuren's implementation of MICE on a different statistical platform. I will demonstrate their use by an example with real data. An article on the topic is in preparation (Royston 2005).

References
Royston, P. 2004. Multiple imputation of missing values. Stata Journal 4: 227–241.
Royston, P. 2005. MICE for multiple imputation of missing values: extension of mvis. Stata Journal, in preparation.

Additional information
royston.ppt

From regression estimates to document tables: output formatting using estout

Ben Jann
ETH Zurich

Abstract
Postestimation processing and formatting of regression estimates for input into document tables are tasks that many of us have to do. However, processing results by hand can be laborious, and is vulnerable to error. There are therefore many benefits to automation of these tasks while at the same time retaining user flexibility in terms of output format. The estout package meets these needs.
estout assembles a table of coefficients, "significance stars", summary statistics, standard errors, t/z statistics, p-values, confidence intervals, and other statistics calculated for up to twenty models previously fitted and stored by estimates store. It then writes the table to the Stata log and/or to a text file. The estimates are formatted optionally in several styles: html, LaTeX, or tab-delimited (for input into MS Excel or Word). There are a large number of options regarding which output is formatted and how. This talk will take users through a range of examples, from relatively basic simple applications to complex ones.

Additional information
jann.zip (related software)

Teaching microeconometrics using Stata

Karl Taylor
University of Leicester

Abstract
This talk will discuss the use of STATA version 8 for teaching, in the context of working with large survey data sets. The range of estimation techniques discussed will include binary response models, discrete choice models, censored dependent variables and sample selection — all in applied economic contexts. In particular, I will describe some of the problems, as well as the benefits, I've encountered with using Stata in the context of the above frameworks. For instance, the issue of whether having a "windows driven menu system" detracts from one of the key benefits of Stata for learning — that is having a structured approach to modeling via do files; and also issues of Stata's speed in terms of gaining marginal effects in discrete choice models in comparison to other available econometric software.

Additional information
taylor_teach.pdf

Interactions made easy

André Charlett
joint with Neville Verlander
Communicable Disease Surveillance Centre, London

Abstract
The creation and testing of interaction terms in regression models can be very cumbersome, even in Stata 8. We propose a simple wrapping command, fitint, that fits any generalized linear model and tests any twoway interactions, as well as all main effects. There is no need to use xi because categorical variables are identified with the option factor. Appropriate tests are chosen depending upon the fitted model.

Additional information
charlet-fitint.pdf
fitint.ado
fitint.hlp

Monte Carlo analysis for dynamic panel data models

Giovanni Bruno
Istituto di Economia Politica, Università Bocconi

Abstract
The Monte Carlo strategy by McLeod and Hipel (Water Resources Research, 1978), originally thought for time series data, has been adapted to dynamic panel data models by Kiviet (1995). This procedure is more efficient than the traditional approaches in that it generates start-up values according to the data generation process, so it avoids wasting random numbers in the generation of initial conditions and also small sample non-stationarity problems. This presentation discusses my Stata implementation of Kiviet's (Journal of Econometrics, 1995) procedure, as used in Bruno (2005) and (2004) to evaluate the finite sample properties of theoretical approximations for the LSDV bias (Bruno (Economics Letters 2005; UKSUG 2004)) and of the bias-corrected LSDV estimator (Bruno (2004); Italian SUG 2004) in the presence of unbalanced designs.

Additional information
bruno.pdf
bruno.zip (related software)

Estimation of inequality indices from survey data, allowing for design effects

Stephen Jenkins
University of Essex

Abstract
Martin Biewen and I have derived the sampling variances of Generalized Entropy and Atkinson indices for the case they are estimated from survey data with a complex design. This talk illustrates how the indices may calculated in Stata, using our commands svyatk and svygei. The empirical illustrations compare income inequality in Britain and Germany.

Additional information
jenkins.pdf
jenkins.zip (related software)

Fixed-effects Estimation and Decomposition: Insights from Monte Carlo Studies

Axel Heitmüeller
London Business School

Abstract
In the presence of more readily available panel data the question arises whether standard decomposition techniques can be applied in the same spirit as in cross-section data. Monte Carlo studies show that employing a simple decomposition into explained and unexplained parts in the presence of time-invariant regressors using fixed-effects estimation will yield biased and inconsistent results. It is shown that this is not the case if the means in time-invariant variables of the respective groups are equal. Hence, it is argued that standard decomposition techniques are only transferable to fixed-effects estimation under certain stricter assumptions which are testable. This talk will outline these arguments, and discuss how the various decomposition techniques can be implemented in Stata.

Additional information
heitmueller.pdf

Statistics and the art of Latin prose

Verity Allan
St Cross College, University of Oxford
with Teresa F. Allan
City University, London

Abstract
I have been investigating various statistical methods of looking for poetical cadences (sentence ends which have rhythm) in Latin prose. Stata was used as my primary software for performing my own analysis, and for checking the analysis of previous scholars. Several methods for determining rhythmicity have been proposed over the last twenty years; I have evaluated the use of some of these and used others to analyze a particular text by the Venerable Bede (a Northumbrian monk, born in c.672, who wrote Biblical commentaries in Latin, amongst other things). The research method involved Chi-squared tests performed against control texts (examples of Latin prose selected for the type of their cadences). The analysis using Stata provided me with the necessary figures for performing adjustments to avoid overtesting (the Chi-squared test was performed many times on the same material). I found that, when compared to control texts, Bede was significantly more likely than the control texts to use rhythmical cadences, but was equally likely to use metrical cadences. I concluded that Bede used rhythmical cadences in his prose, but may not have used metrical cadences in his prose.

Additional information
allan.pdf

Estimation of the gender pay gap in London and the UK — an econometric approach

Abstract
Margarethe Theseira
joint with Leticia Veruete–McKay
GLA Economics, London We estimate the gender pay gap in London and the UK based on Labour Force Survey data 2002/03. Our approach decomposes the mean average wages of men and women into two parts:

Differences in individual and job characteristics between men and women (such as age, number of children, qualification, ethnicity, region of residence, working in the public or private sector, working part-time or full-time, industry, occupation and size of company)

Unequal treatment and/or unexplained factors. Stata enables us to easily implement a cross sectional regression for a large household dataset and derive the distribution of wages for men and women. Results from our work indicate that differences in individual and job characteristics account for most of the gender pay gap. The impact of direct unequal treatment appears to be slightly lower in London than outside, reducing London women's wages by around four per cent compared to six percent outside London.
Our wage distribution analysis indicates that for London part-time workers of both sexes are paid less than full-time workers. Among full-time workers, the lower-paid workers have virtually no difference in pay between men and women however the gender pay gap widens further up the wage distribution to 24 per cent for the top decile.

Additional information
theseira.pdf

Estimation of ordinal response models, accounting for sample selection bias

Alfonso Miranda
Keele University

Abstract
Studying behavior in economics, sociology, and statistics often involves fitting a model in which the outcome is an ordinal response which is only observed for a subsample of subjects. (For example, questions about health satisfaction in a survey might be asked only of respondents who have a particular health condition.) In this situation, estimation of the ordinal response model without taking account of this "sample selection" effect, using e.g. ologit or oprobit, may lead to biased parameter estimates. (In the earlier example, unobserved factors that increase the chances of having the health condition may be correlated with the unobserved factors that affect health satisfaction.) The program gllamm can be used to estimate ordinal response models accounting for sample selection, by ML. This paper describes a "wrapper" program, osm, that calls gllamm to fit the model. It accepts data in a simple structure, has a straightforward syntax and, moreover, reports output in a manner that is easily interpretable. One important feature of osm is that the log-likelihood can be evaluated using adaptive quadrature.

Additional information
miranda-osm.pdf

Review lecture: recent developments in Stata

Roberto Gutierrez
StataCorp

Abstract
Included with Stata version 9 is the new command xtmixed, for fitting linear mixed models. Mixed models containing both fixed and random effects. The fixed effects are analagous to standard regression coefficients and are estimated directly. The random effects are not directly estimated but are summarized according to the unique elements of their respective variance–covariance matrices, known as variance components. xtmixed syntax is summarized and demonstrated using several examples, postestimation tasks are desribed, and future areas of Stata growth in the field of mixed models in general is discussed.

Additional information
gutierrez.pdf

Some essentials of data cleaning: hints and tips

Felicity Clemens
London School of Hygiene and Tropical Medicine

Abstract
The cleaning and verification process of many different types of datasets often involves considering similar problems. This presentation will give a very brief simple overview of three useful processes and their associated Stata commands:

Finding, counting and removing duplicated data and other multiple entries;
Summing individual-level entries to give an overall score per individual — when to treat missing data as 0;
Recap of merging data and uses of the merge command
The presentation will outline the difficulties that are frequently encountered in these three situations and show how they can be addressed using the common Stata commands of count, rsum, sum and merge/append respectively.

Additional information
datacleaning_public.ppt
cleans.zip (datasets and do-file)

Stata 8 graphics: Options, suboptions, and subsuboptions

Tim Collier
London School of Hygiene and Tropical Medicine

Abstract
Stata 8 graphics have changed out of all recognition from that available in earlier versions. It was not just that a whole new array of options and sub-options were introduced, but the graph syntax itself completely changed. Just trying to produce a simple plot of x against y using Stata 7 syntax (graph x y) produced bewildering error messages e.g. xgraph_g.new y: class member function not found r(4023) and the like. If you did succeed in working out the new syntax (graph twoway scatter x y) it then seemed to take forever-and-a-day for that oh-so-very-simple graph to appear. Even if you were the patient type prepared to wait for that graph to appear numerous bugs further tested your resolve to persevere. Many gave up at this point and chose to use the graph7 option that enabled the user to access the old graph commands. Life was too short!
But, things have moved on and quickened up considerably. Stata 8 does offer the potential to produce effective publication-standard graphs. A broad range of graph types are available with the user being able to control almost every aspect of what will appear. Taking time to learn the new graph syntax and to explore the options, sub-options and even sub-sub-options will pay dividends.
The aim of this session is to convince you of the benefit of persevering with Stata 8 graphs. It will introduce some of the more useful graph types, in particular the twoway family. These will be used to show how to build a graph command, to highlight some of the more useful options available, and to show how to produce an eye-catching and effective end product.

Additional information
collier_graphics.pdf
collier_graphs.pdf

Applications of gllamm in health evaluation studies

Andrew Pickles
joint with Milena Falcaro and Bethan Davies
University of Manchester

Abstract
gllamm provides a framework within which many of the more difficult analyses required for trials and intervention studies may be undertaken.
Treatment effect estimation in the presence of non-compliance can be undertaken using instrumental variable (IV) methods. We illustrate how gllamm can be used for IV estimation for the full range of types of treatment and outcome measures and describe how missing data may be tackled on an assumption of latent ignorability. Alternative approaches to account for clustering and analysis of cluster-randomized studies will also be described.
Quality of life and economic evaluation of outcomes often makes use of discrete choice and stated preference experiments in which illness scenarios are assessed. We illustrate how gllamm can be used for the analysis of data from such studies, whether these are in the common form of paired comparisons or the more complex case where multiple scenarios are ranked.
Examples from studies of a school-based smoking intervention, a re-employment encouragement experiment, a group therapy trial and of quality-of-life with rheumatoid arthritis will be considered.

Additional information
pickles-gllamm.pdf

A little bit of Stata programming goes a long way ...

C. F. Baum
Boston College

Abstract
This tutorial will discuss a number of elementary Stata programming constructs and discuss how they may be used to automate and robustify common data manipulation, estimation and graphics tasks. Those used to the syntax of other statistical packages or programming languages must adopt a different mindset when working with Stata to take full advantage of its capabilities. Some of Stata's most useful commands for handling repetitive tasks: forvalues, foreach, egen, local and matrix are commonly underutilized by users unacquainted with their power and ease of use. While relatively few users may develop ado-files for circulation to the user community, nearly all will benefit from learning the rudiments of use of the program, syntax and return statements when they are faced with the need to perform repetitive analyses. Worked examples making use of these commands will be presented and discussed in the tutorial.

Additional information
baum.pdf

Scientific organizers

Bianca DeStavola, London School of Hygiene and Tropical Medicine
Stephen Jenkins, University of Essex

Logistics organizers

Timberlake Consultants, the official distributor of Stata in the United Kingdom, Ireland, Spain, and Portugal.