Home  /  Resources & support  /  Users Group meetings  /  2003 Irish Stata Users Group Meeting

Last updated: 9 June 2003

2003 Irish Stata Users Group Meeting

Thursday, 22 May 2003

Trinity College
Maxwell Theatre, Hamilton Building
Dublin, Ireland
22 May 2003


Checking for support in propensity score matching

Arnaud Chevalier, University College Dublin

Propensity score matching has recently become a popular estimator. The basic idea is to calculate a propensity of being treated and then match a treated individual with a nontreated individual with a similar propensity score. The estimate will be unbiased as long as the selection to treatment is based on observable characteristics and if a common support is found. The common support is basically that all treated observations can be matched with a control. This programme provides some of the results needed to document the common support assumption and can be used after psmatch.

Graphics (and numerics) for comparison

Nick Cox, Durham University, UK

Most statistical data analysis, and thus most graphical data analysis, is directed towards modelling of relationships, but many statistical problems have a different flavour: their focus is comparison, and the key question is assessing agreement or disagreement between two or more datasets or subsets with variables measured in the same units. I survey the range of official and user-written graphical programs available in Stata 8 for such problems, with emphasis on making use of all the information in the data. Recurrent themes include (1) the use of reference lines, especially horizontal reference lines, indicating benchmark cases; (2) the relative merits of superimposition and juxtaposition; (3) how far methods work well at a range of sample sizes; (4) standing on giant's shoulders by writing wrappers around existing Stata commands; (5) use (and abuse) of summary statistics appropriate for such problems.

Testing effects of multiple genes in Stata: tests of stratification and of cumulative effects of multiple genes.

Cliona Molony, Tony Fitzgerald, Denis Shields (presenting author), Royal College of Surgeons in Ireland


There are about 30,000 genes in the human genome and a number of variants per gene. Case–control studies are sensitive to non-independence of genetic factors whose frequencies cluster according to population history. Corrections for confounding have been presented in the literature. A simple implementation of one of these tests in Stata is shown by simulation to be reasonably robust. Allowing for overdispersion in allele frequency differences only marginally alters results. Since individual genes often only contribute minor effects to complex diseases such as cardiovascular disease, comparing the likelihood ratio of a null model with that of a model fitting effects of multiple genes provides a test with the number of degrees of freedom equal to the number of genes. The utility and relevance of this approach are discussed, and contrasted with models testing for each gene effect in turn.

Text analysis using Stata: the wordscoring approach to content analysis using words as data

Kenneth Benoit, Political Science, TCD


The "word-scoring" approach to content analysis developed by Laver, Benoit, and Garry (American Political Science Review, June 2003) extracts has been used to summarize content from political texts based on a statistical analysis of word frequencies. Unlike nearly all other methods of computerized content analysis, "wordscores" does not rely on predefined coding schemes or dictionaries, but instead compares texts based on relative word frequencies, mapping patterns from texts whose content is known or assumed onto texts whose content the researcher wishes to estimate. Furthermore, because Wordscores makes to attempt to assess the meaning or linguistic structure of words, it works in any language. To implement this method, we have written the Wordscores suite of software implemented as .ado extensions in Stata 7.0. This software draws heavily from Stata's built-in word-parsing capabilities and data merging capabilities based on matching words. Not only is Stata capable of quickly generating and analyzing huge matrices of word frequencies, but also Stata's basic orientation as a statistical program makes it perfectly suited to statistical analysis of the word frequency information. Stata's capability for providing user-written help files, and for installing and updating .ado packages over the Internet, also make it an ideal platform for distributing our software for noncommercial, scientific use. To our knowledge, Wordscores is the first Stata application to perform content analysis of texts.

Handling panel data

Brendan Halpin, Department of Sociology, University of Limerick


More complex datasets such as panel surveys require a good deal of repetitive processing. Stata's programmability makes the repetition more manageable, reducing the risk of error and increasing the analyst's efficiency. Other Stata features such as the reshape command and iterative constructs such as for make handling complex data substantially easier than in other well-established stats packages.

Use of Stata for analysing outcomes in solid organ transplantation

Patrick Kelly, Beaumont Hospital


Stata has many features of particular interest to Biostatisticians and epidemiologists, amongst which is the efficient and effective analysis of survival data. Kaplan — Meier, Cox regression, parametric models and the presentation of life tables are all extensively covered in the software. Improved graphics and the analysis of time dependent covariates are some of the recent advances made to the latest version of Stata (Stata 8). This study looks at several years work involving survival-based analysis on solid organ transplantation outcomes in the Republic of Ireland.

Data must initially be set for survival analysis. Following this summary commands give an overview of the data for analysis as well as providing summary survival statistics. The commands for graphing and listing the data give the Kaplan — Meier survivor functions. Modelling commands generally involves Cox regression and the covariates are tested for the proportional hazards assumption using Schoenfeld residuals. Where appropriate, parametric methods can also be deployed for various distributions from an overall single command.

Approximately 130 renal transplants, 15 heart transplants, and 7 pancreas transplants are performed annually in the Republic of Ireland. In total, 16 years of data were available for kidney transplants, 15 years for heart transplants, and 10 years for pancreas transplants. Survival analysis for organ transplantation is generally measured for two outcomes, graft and patient survival. Graft survival is measured either with or without censoring for death with a functioning graft. Patient survival is measured from time of first transplant till end of study, death or lost to follow-up. Because of the serious nature of the procedures involved in transplantation and the need for constant follow up of patients health status, it is not to common that patients are lost to follow up. Usually this occurs when a patient moves outside the state.

Capture–recapture models — issues in selecting a 'best' model.

Alan Kelly, Trinity College Dublin


The use of log-linear models in capture–recapture studies — both animal and human — is a long established methodology dating back to the beginning of the 1970s. In spite of this, there are still outstanding issues regarding the choice of a best fitting model, with various alternative goodness-of-fit measures proposed based on either theoretical or pragmatic grounds. In this presentation a number of these measures will be considered and their performance contrasted — particularly with due consideration to the implications for the estimate (and its standard error) for the population size N. These will be illustrated using a recent study on opiate abuse in Ireland.

Generalised linear models for prediction: some principles, some programs, and some practice

Nicholas J. Cox, University of Durham


Despite a history now over 30 years long, the adoption of generalised linear models (GLMs) remains patchy: they are well-known in several fields, but used little, if at all, in many others. One major advantage of GLMs is that they return predictions on the scale of the response. The use of link functions avoids the need for prior transformation of the response, for back-transformation of predictions, and above all for bias corrections to back-transformations, whether systematic or ad hoc. Case studies from environmental applications (suspended sediment concentrations of rivers, heights of forest trees) are introduced in which predictions on the response scale are of paramount scientific and practical interest. Heavy use is made of a suite of Stata programs written by the author producing graphic and numeric diagnostics after regression-type models, which extend and complement commands in official Stata. Most of these programs have uses beyond GLMs and they will also be discussed directly.

From the Stata Technical Bulletin to the Stata Journal

Joseph H. Newton, Texas A&M
Nicholas J. Cox, University of Durham


We will report briefly on the introduction of the Stata Journal.

On dynamically linked libraries (DLL's) in Stata

Roberto G. Gutierrez and Chinh Nguyen, StataCorp


Dynamically linked libraries, DLL's as they are commonly referred, can serve as useful and integral parts of Stata user-written commands. Since they consist of compiled code, DLL's can speed up the execution of computationally-intensive portions of commands which are otherwise written using Stata's ado language. In this talk, we outline a simple and easily-callable interface between Stata ado code and DLL's written in the C programming language. An example of this process, as applied to a command which performs local polynomial smoothing, will also be presented.

Scientific organizers

Ronan Conroy

Alan Kelly

Logistics organizers

Timberlake Consultants, the official distributor of Stata in the UK, Ireland, Spain and Portugal.