Last updated: 17 October 2011
 2011 UK Stata Users Group meeting 
 15–16 September 2011 
  
  
  Centre for Econometric Analysis
  Cass Business School
  106 Bunhill Row
  London     EC1 8TZ
  United Kingdom
Proceedings
Sensible parameters for polynomials and other splines
Roger B. Newson
National Heart and Lung Institute, Imperial College London
  Splines, including polynomials, are traditionally used to model nonlinear
  relationships involving continuous predictors. However, when they are
  included in linear models (or generalized linear models), the estimated
  parameters for polynomials are not easy for nonmathematicians to understand,
  and the estimated parameters for other splines are often not easy even for
  mathematicians to understand. It would be easier if the parameters were
  differences or ratios between the values of the spline at the reference
  points and the value of the spline at a base reference point
  or if the parameters were
  values of the polynomial or spline at reference points on the 
x-axis,
  or
  The
  
bspline package can be downloaded from Statistical Software Components, and generates spline
  bases for inclusion in the design matrices of linear models, based on
  Schoenberg 
B-splines. The package now has a recently added module
  
flexcurv, which inputs a sequence of reference points on the
  
x-axis and outputs a spline basis, based on equally spaced knots
  generated automatically, whose parameters are the values of the spline at
  the reference points. This spline basis can be modified by excluding the
  spline vector at a base reference point and including the unit vector. If
  this is done, then the parameter corresponding to the unit vector will be
  the value of the spline at the base reference point, and the parameters
  corresponding to the remaining reference spline vectors will be differences
  between the values of the spline at the corresponding reference points and
  the value of the spline at the base reference point. The spline bases are
  therefore extensions, to continuous factors, of the bases of unit vectors
  and/or indicator functions used to model discrete factors. It is possible to
  combine these bases for different continuous and/or discrete factors in the
  same way, using product bases in a design matrix to estimate factor-value
  combination means and/or factor-value effects and/or factor interactions.
  
   
Additional information
   UK11_newson.pdf
   UK11_newson_dofiles1.zip
Experiences and lessons learned from bootstrapping random-effects predictions
Robert Grant
Kingston University and St. George’s University of London
  Background: Random effects are commonly modeled in multilevel, longitudinal,
  and latent-variable settings. Rather than estimating fixed effects for
  specific clusters of data, “predictions” can be made as the mode
  or mean of posterior distributions that arise as the product of the random
  effect (an empirical Bayes prior) and the likelihood function conditional on
  cluster membership. 
  
  Analyses and data: This presentation will explore the
  experiences and lessons learned in using the bootstrap for inference on
  random-effects predictors following logistic regression models conducted
  through both 
xtmelogit and 
gllamm.
  In the United Kingdom, 203 hospitals were compared on the
  quality of care received by 10,617 stroke patients through multilevel
  logistic regression models. 
  
  Results and considerations: Multilevel
  modeling and prediction are both computer-intensive, and so bootstrapping
  them is especially time-consuming. Examples from do-files with some helpful
  approaches will be shown. A small proportion of modal best linear
  unbiased predictors contained
  errors, possibly arising from the prediction algorithm. Various bootstrap
  confidence intervals exhibited problems such as excluding the point
  prediction and degeneracy. Methods for tracing the source will be presented.
  
  Conclusion: Bootstrapping provides flexible but time-consuming inference for
  individual clusters’ predictions. However, there are potential
  problems that analysts should be aware of.
  
   
Additional information
   UK11_Grant.ppt
Sensitivity analysis for randomized trials with missing outcome data
Ian White
MRC Biostatistics Unit, Cambridge
  Any analysis with incomplete data makes untestable assumptions about the
  missing data, and analysts are therefore urged to conduct sensitivity
  analyses. Ideally, a model is constructed containing a nonidentifiable
  parameter 
d, where 
d = 0 corresponds to the assumption made in
  the standard analysis, and the value of 
d is then varied in a range
  considered plausible in the substantive context. I have produced Stata
  software for performing such sensitivity analyses in randomized trials with
  a single outcome, when the user specifies a value or range of values of
  
d. The analysis model is assumed to be a generalized linear model
  with adjustment for baseline covariates. I will describe the statistical
  model used to allow for the missing data, sketch the programming required
  to obtain a sandwich variance estimator, and describe modifications needed
  to make the results given when 
d = 0 correspond exactly to those
  results available by standard methods. I will illustrate the use of the software for
  binary and continuous outcomes, when the standard analysis assumes either
  missing at random or (for a binary outcome) 
“missing =
  failure”.
  
   Additional information
   UK11_White.pdf
Implementing the continual reassessment method (CRM)
Adrian Mander
MRC Biostatistics Unit Hub for Trials Methodology, Cambridge
  One of the aims of a phase I trial in oncology is to find the maximum
  tolerated dose. A set of doses is administered to participants starting from
  the lowest dose in increasing steps. To do this safely, the toxicity of each dose
  is assessed, and a decision is made about whether to proceed with the next
  highest dose until the desired target toxicity level is found. A suitable
  dose is then chosen to take forward into phase II studies to discover
  whether this drug is efficacious. The majority of oncology phase I trials
  use algorithm-based rules such as the 3 + 3 design to escalate doses; the 3 + 3
  design is easy to implement by nonstatisticians but is statistically
  inefficient. Other designs, such as the continual reassessment method
  (O’Quigley, Pepe, and Fisher 1990), use a model to help guide the decision of
  which dose to give. The complexity of the CRM and that it requires software
  may be reasons why it is not more widely used. This talk will describe a new
  command 
crm that is a Mata implementation of the CRM and includes
  some discussion about the programming difficulties.
  
   
Additional information
   UK11_Mander.pdf
A review of estimators for the fixed-effects ordered logit model
Arne Risa Hole
University of Sheffield
Joint with Andy Dickerson and Luke Munford
  It is well-known that the dummy variable estimator for the fixed-effects
  ordered logit model is inconsistent when 
T, the dimension of
  the panel, is fixed. This talk will review a range of alternative 
  fixed-effects ordered logit 
  estimators that are based on Chamberlain’s fixed-effects estimator
  for the binary logit model. The talk will present Stata code for the
  estimators and discuss the available evidence on their finite-sample
  performance. We will conclude by presenting an empirical example in which
  the estimators are used to model the relationship between commuting and life
  satisfaction.
  
   
Additional information
   UK11_Hole.pdf
Generalized method of moments fitting of structural mean models
Tom Palmer
MRC CAiTE Centre, School of Social and Community Medicine, University of Bristol
Joint with Roger Harbord, Paul Clarke, and Frank Windmeijer
  In this talk we describe how to fit structural mean models (SMMs), as
  proposed by Robins, using instrumental variables in the generalized
  method of moments (GMM) framework using Stata’s 
gmm command.
  The GMM approach is flexible because it can fit overidentified models in
  which there are more instruments than endogenous variables. It also allows
  assessment of the joint validity of the instruments using Hansen’s 
J
  test through Stata’s 
estat overid gmm postestimation command.
  In the case of the logistic SMM, the approach also allows different first-stage
  association models. We show the relationship between the
  multiplicative SMM and the multiplicative GMM estimator implemented in the
  
ivpois command of Nichols (2007). For the multiplicative SMM, we
  show—analogously to Imbens and Angrist (1994) for the linear case—that the
  estimate is a weighted average of local estimates using the instruments
  separately. To demonstrate the models, we use a Mendelian randomization
  example, in which genotypes found to be robustly associated with risk
  factors from genome-wide association studies are used as instrumental
  variables, thereby investigating the effect of being overweight on the risk of
  hypertension in the Copenhagen General Population Study.
  
   
Additional information
   UK11_palmer_handouts.pdf
   UK11_palmer_presentation.pdf
Flexible joint modeling of longitudinal and time-to-event data
Michael J. Crowther
Department of Health Sciences, University of Leicester
Joint with Keith R. Abrams and Paul C. Lambert
  The joint modeling of longitudinal and time-to-event data has exploded in
  the methodological literature in the past decade; however, the availability
  of software to implement the methods lags behind. The most common form of
  joint model assumes that the association between the survival and
  longitudinal processes are underlined by shared random effects. As a result,
  computationally intensive numerical integration techniques such as
  Gauss–Hermite quadrature are required to evaluate the likelihood.  We
  describe a new user-written command 
jm, which allows the user to
  jointly model a continuous longitudinal response and an event of interest.
  We assume a linear mixed-effects model for the longitudinal submodel, thereby
  allowing flexibility through the use of fixed and/or random fractional
  polynomials of time. We also assume a flexible parametric model (
stpm2) for the
  survival submodel. Flexible parametric models are fitted on the log
  cumulative hazard scale, which has direct computational benefits because it
  avoids the use of numerical integration to evaluate the cumulative hazard. We
  describe the features of 
jm through application to a dataset
  investigating the effect of serum albumin level on time to death from any
  cause in 252 patients suffering end-stage renal disease.
  
   
Additional information
   UK11_crowther.pdf
Sample size and power estimation when covariates are measured with error
Michael Wallace
London School of Hygiene and Tropical Medicine
  Measurement error in exposure variables can lead to bias in effect
  estimates, and methods that aim to correct this bias often come at the
  price of greater standard errors (and so, lower statistical power). This
  means that standard sample size calculations are inadequate and that, in
  general, simulation studies are required. Our routine 
autopower aims
  to take the legwork out of this simulation process, restricting attention to
  univariate logistic regression where exposures are subject to classical
  measurement error. It can be used to estimate the power of a particular model
  setup or to search for a suitable sample size for a desired power. The
  measurement error correction methods that are employed are regression
  calibration (
rcal) and a conditional score method—a Stata
  routine that we also introduce.
  
   
Additional information
   UK11_wallace.ppt
Splines models for prediction of house prices
David Boniface
Epidemiology and Public Health, University College London
  Aim: To create a web-based facility for customers to enter an address of a house
  and obtain a graph showing the trend of price of house since last sold,
  extrapolated to current date, within milliseconds.
  
  Method: The UK Land Registry of house sale prices was used to estimate mean
  price trends from 2000 to 2010 for each category of house.  The Stata
  ado-file 
uvrs (with user-specified knots) was used to model the
  curve. The parameter estimates were saved. Later, to respond in real time
  to a query about a particular house, 
splinegen was used to generate
  the spline curve for the appropriate time period, which was adjusted to apply
  to the particular house and plotted on the webpage.
  
  Challenges: use of coded date, choice of user knots for splines, 
  saving and retrieving the knots and parameter estimates, use of log
  scale for prices to deal with skewed price distribution,  estimation of
  prediction intervals, and the 2009 slump in house prices
  
   
Additional information
   UK11_boniface.ppt
Endogenous treatment effects for count data models with endogenous participation or sample selection
Alfonso Miranda
Institute of Education, University of London
Joint with Massimiliano Bratti
  We propose an estimator for models in which an endogenous dichotomous
  treatment affects a count outcome in the presence of either sample selection
  or endogenous participation using maximum simulated likelihood. We allow for
  the treatment to have an effect on both the participation or the sample
  selection rule and on the main outcome. Applications of this model are
  frequent in—but not limited to—health economics. We show an
  application of the model using data from Kenkel (Kenkel and Terza, 2001,
  
Journal of Applied Econometrics 16: 165–184), who investigated
  the effect of physician advice on the amount of alcohol consumption. Our
  estimates suggest that in these data, a) neglecting treatment endogeneity
  leads to a wrongly signed effect of physician advice on drinking intensity,
  b) accounting for treatment endogeneity but neglecting endogenous
  participation leads to an upwardly biased estimate of the treatment effect,
  and c) advice affects only the drinking-intensive margin but not drinking
  prevalence.
  
   
Additional information
   UK11_Miranda.pdf
Multiple imputation with large proportions of missing data: How much is too much?
Jin Hyuk Lee
Texas A&M Health Science Center
Joint with John Huber Jr.
  Multiple imputation (MI) is known as an effective method for handling
  missing data. However, it is not clear that the method will be effective
  when the data contain a high percentage of missing observations on a
  variable. This study examines the effectiveness of MI in
  data with 10% to 80% missing observations using absolute bias and
  root mean squared error of MI measured under missing completely at
  random, missing at random, and not missing at random
  assumptions. Using both simulated data drawn from multivariate normal
  distribution and example data from the Predictive Study of Coronary
  Heart Disease, the bias and root mean squared error using MI are much smaller than
  of the results when complete case analysis is used. In addition, the bias
  of MI is consistent regardless of increasing imputation numbers (M) from
  M = 10 to M = 50. Moreover, compared to the regression method and predictive
  mean matching method, the Markov chain Monte Carlo method can also be used for continuous and
  univariate missing variables as an imputation mechanism. In conclusion, MI
  produces less-biased estimates, but when large proportions of data are
  missing, other things need to be considered such as the number of
  imputations, imputation mechanisms, and missing data mechanisms for proper
  imputation.
  
   Additional information
   UK11_lee.pptx
Testing the performance of the two fold FCS algorithm for multiple imputation of longitudinal clinical records
Irene Petersen
University College London
Joint with Catherine Welch, Jonathan Bartlett, Ian White, Richard Morris, Louise Marston, Kate Walters, Irwin Nazareth, and James Carpenter
  Multiple imputation is increasingly regarded as the standard method to
  account for partially observed data, but most methods have been based on
  cross-sectional imputation algorithms. Recently, a new multiple-imputation
  method, the two fold fully conditional specification (FCS) method, was
  developed to impute missing data in longitudinal datasets with nonmonotone
  missing data. (See Nevalainen J., Kenward M.G., and Virtanen S.M. 2009. Missing values
  in longitudinal dietary data: A multiple imputation approach based on a
  fully conditional specification. 
Statistics in Medicine 28:
  3657–3669.) This method imputes missing data at a given time point based on
  measurements recorded at the previous and next time points. Up to now, the
  method has only been tested on a relatively small dataset and under very
  specific conditions. We have implemented the two fold FCS algorithm in Stata,
  and in this study we further challenge and evaluate the performance of the
  algorithm under different scenarios. In simulation studies, we generated
  1,000 datasets, which were similar in structure to the longitudinal clinical
  records (The Health Improvement Network primary care database) to
  which we will apply the two fold FCS algorithm. Initially, these generated
  datasets included complete records. We then introduced different levels and
  patterns of partially observed data patterns and applied the algorithm to
  generate multiply imputed datasets. The results of our initial multiple
  imputations demonstrated that the algorithm provided acceptable results when
  using a linear substantive model and data were imputed over a limited time
  period for continuous variables such as weight and blood pressure.
  Introducing an exponential substantive model introduced some bias, but
  estimates were still within acceptable ranges. We will present results for
  simulation studies that include situations where categorical and
  continuous variables change over a 10-year period (for example, smokers become
  ex-smokers, weight increases or decreases) and large proportions of data are
  unobserved.  We also explore how the algorithm deals with interactions and
  whether it has any impact on the final data distribution—whether the algorithm is
  initiated to run forward or backward in time.
  
   
Additional information
   UK11_welch.pptx
Implementing procedures for spatial panel econometrics in Stata
Gordon Hughes
School of Economics, University of Edinburgh
  Econometricians have begun to devote more attention to spatial interactions
  when carrying out applied econometric studies. In part, this is motivated by
  an explicit focus on spatial interactions in policy formulation or market
  behavior, but it may also reflect concern about the role of omitted
  variables that are or may be spatially correlated.  The classic models of
  spatial autocorrelation or spatial error rely upon a predefined matrix of
  spatial weights 
W, which may be derived from an explicit model of
  spatial interactions but which, alternatively, could be viewed as a flexible
  approximation to an unknown set of spatial links similar to the use of a
  translog cost function. With spatial panel data, it is possible, in
  principle, to regard 
W as potentially estimable, though the number of
  time periods would have to be large relative to the number of spatial panel
  units unless severe restrictions are placed upon the structure of the
  spatial interactions. While the estimation of 
W may be infeasible for
  most real data, there is a strong, formal similarity between spatial panel
  models and nonspatial panel models in which the variance–covariance
  matrix of panel errors is not diagonal. One important variant of this type
  of model is the random-coefficient model in which slope coefficients differ
  across panel units so that interest focuses on the mean slope coefficient
  across panel units. In certain applications—for example, cross-country
  (macro-)economic data—the assumption that reaction coefficients are
  identical across panel units is not intuitively plausible. Instead of just
  sweeping differences in coefficients into a general error term, the
  random-coefficient model allows the analyst to focus on the common component of
  responses to changes in the independent variables while retaining the
  information about the error structure associated with coefficients that are
  random across panel units but constant over time for each panel unit.
  
  At present, Stata’s spatial procedures include a range of user-written
  routines that are designed to deal with cross-sectional spatial data. The
  recent release of a set of programs (including 
spmat, 
spivreg,
  and 
spreg) written by Drukker, Prucha, and Raciborski
  provides Stata’s users with the opportunity to fit a wide range
  of standard spatial econometric models for cross-sectional data. Extending
  such procedures to deal with panel data is nontrivial, in part because
  there are important issues about how panels with incomplete data should be
  treated. The casewise exclusion of missing data is automatic for
  cross-sectional data, but omitting a whole panel unit because some of the data
  in the panel are missing will typically lead to a very large reduction in the size
  of the working dataset. For example, it is very rare for international
  datasets on macroeconomic or other data to be complete, so that casewise
  exclusion of missing data will generate datasets that contain many fewer
  countries or time periods than might otherwise be usable.
  
  The theoretical literature on econometric models for the analysis of spatial
  panels has flourished in the last decade with notable contributions from
  LeSage and Pace, Elhorst, and Pfaffermayr, among others. In some cases,
  authors have made available specific code for the implementation of the
  techniques that they have developed. However, the programming language of
  choice for such methods has been MATLAB, which is expensive and has a fairly
  steep learning curve for nonusers. Many of the procedures assume that
  there are no missing data and the procedures may not be able to handle large datasets
  because the model specifications can easily become unmanageable if either
  
N (the number of spatial units) or 
T (the number of time
  periods) becomes large.
  
  The presentation will cover a set of user-written maximum likelihood
  procedures for fitting models with a variety of spatial structures
  including the spatial error model, the spatial Durbin model, the
  spatial autocorrelation model, and certain combinations of these
  models—the terminology is attributable to LeSage and Pace (2009).
  A suite of
  MATLAB programs to fit these models for both random and fixed effects
  has been compiled by Elhorst (2010) and provides the basis for the
  implementation in Stata/Mata. Methods of dealing with missing data,
  including the implementation of an approach proposed by Pfaffermayr (2009),
  will be discussed.
  The problem of missing data is most severe when data on
  the dependent variable are missing in the spatial autocorrelation model
  because it means that information on spatial interactions may be greatly
  reduced by the exclusion of countries or other panel units. In such cases,
  some form of imputation may be essential, so the presentation will
  consider alternative methods of imputation. It should be noted that
  
mi does not support panel data procedures in general, and the
  relatively high cost of fitting spatial panel models means that it may be
  difficult to combine 
mi with spatial procedures for practical
  applications.
  
  A second aspect of spatial panel models that will be covered in the
  presentation concerns the links between such models and random-coefficient
  models that can be fit using procedures such as 
xtrc or the
  user-written procedure 
xtmg. The classic formulation of
  random-coefficient models assumes that the variance–covariance model of panel
  errors is diagonal but heteroskedastic. This is an implausible assumption
  for most cross-country datasets, so it is important to consider how it may
  be relaxed, either by allowing for explicit spatial interactions or by
  using a consistent estimator of the cross-country variance–covariance
  model.
  
  The user-written procedures introduced in the presentation will be
  illustrated by applications drawn from analyses of demand for
  infrastructure, health outcomes, and climate for cross-country data covering
  the developing and developed world plus regions in China.
  
   
Additional information
   UK11_hughes.pdf
Structural equation modeling for those who think they don’t care
Vince Wiggins
StataCorp LP
  We will discuss SEM (structural equation modeling), not from the perspective
  of the models for which it is most often used—measurement models,
  confirmatory factor analysis, and the like—but from the perspective of
how it can extend other estimators.  From a wide range of choices, we will
focus on extensions of mixed models (random and fixed-effects regression).
Extensions include conditional effects (not completely random), endogenous
covariates, and others.
  
   Additional information
   UK11_Wiggins.pdf
Chained equations and more in multiple imputation in Stata 12
Yulia Marchenko
StataCorp LP
  I present the new Stata 12 command, 
mi impute chained, to
  perform multivariate imputation using chained equations (ICE), also known as
  sequential regression imputation.  ICE is a flexible imputation technique
  for imputing various types of data.  The variable-by-variable specification
  of ICE allows you to impute variables of different types by choosing the
appropriate method for each variable from several univariate imputation
methods.  Variables can have an arbitrary missing-data pattern.  By specifying
a separate model for each variable, you can incorporate certain important
characteristics, such as ranges and restrictions within a subset, specific to
each variable.  I also describe other new features in multiple imputation in
Stata 12.
  
   
Additional information
   UK11_marchenko.pdf
Exporting CAPI data to Stata: Experience from surveybe
Joachim De Weerdt
Economic Development Initiatives, Tanzania
  Researchers typically spend significant amounts of time cleaning and
  labeling data files in preparation of analyses of survey data.
  Computer-assisted personal interviewing (CAPI) gives the ability to automate this
  process. First, consistency checks can be run during the interview so that
  only data that passes autogenerated and user-written validation tests comes
  back from the field. Second, CAPI allows for the autogeneration of a Stata
  do-file that labels data files. This presentation discusses the Stata
  export procedure used by 
surveybe, a CAPI application designed to
  handle complex surveys. The questions, as displayed on the screen to the
  interviewer, are automatically turned into variable labels. Likewise, the
  drop-down menus are autocoded as value labels. Furthermore, the export
  procedure ensures that data from rosters get exported into different Stata
  data files and that complete referential integrity is ensured across all the
  files originating from the same survey, with unique primary keys linking
  files together. Any changes made to the electronic questionnaire (for example,
  adding a response code to the drop-down menu) or changes to the phrasing of a
  question will be automatically incorporated into the exported data files,
  thus ensuring that the data files match the questionnaires completely.
  
   
Additional information
   UK11_deweerdt.pdf
Using Mata to import Illumina SNP chip data for genome-wide association studies
J. Charles Huber Jr.
Texas A&M University
Joint with Michael Hallman, Victoria Friedel, Melissa Richard and Huandong Sun
  Modern genetic genome-wide association studies typically rely on
  single nucleotide polymorphism (SNP) chip technology to determine hundreds
  of thousands of genotypes for an individual sample.  Once these genotypes
  are ascertained, each SNP alone or in combination is tested for association
  outcomes of interest such as disease status or severity.  Project Heartbeat!
  was a longitudinal study conducted in the 1990s that explored changes in
  lipids and hormones and morphological changes in children from 8 to 18
  years of age.  A genome-wide association study is currently being conducted to look for SNPs
  that are associated with these developmental changes. While there are
  specialty programs available for the analysis of hundreds of thousands of
  SNPs, they are not capable of modeling longitudinal data.  Stata is well
  equipped for modeling longitudinal data but cannot load hundreds of
  thousands of variables into memory simultaneously.  This talk will briefly
  describe the use of Mata to import hundreds of thousands of SNPs from the
  Illumina SNP chip platform and how to load those data into Stata for
  longitudinal modeling.
  
   Additional information
   UK11_Huber.pptx
Using Stata for handling CDISC datasets
Adam Jacobs
Dianthus Medical Limited
  The Clinical Data Interchange Standards Consortium (CDISC) is a globally
  relevant nonprofit organization that defines standards for handling data
  in clinical research. It produces a range of standards for clinical data at
  various stages of maturity.  One of the most mature standards is the Study
  Data Tabulation Model, which provides a standardized yet flexible
  data structure for storing entire databases from clinical trials. A related
  standard is the Analysis Dataset Model, which defines datasets that
  can be used for analyzing data from clinical trials. I shall explain how the
  CDISC standards work, how Stata can simplify many of the routine tasks
  encountered in handling CDISC datasets, and the great efficiencies that can
  result from using datasets in a standardized structure.
  
   Additional information
   UK11_jacobs.ppt
Picturing mobility: Transition probability color plots
Philippe Van Kerm
CEPS/INSTEAD, Luxembourg
  This talk presents a simple but effective graphical device for visualization
  of patterns of income mobility. The device in effect uses color palettes to
  picture information contained in transition matrices created from a fine
  partition of the marginal distributions. The talk explains how these graphs
  can be constructed using the user-written package 
spmap from Maurizio
  Pisati, briefly presents the wrapper command 
tpcplot (for
  transition probability color plots) and demonstrates how such
  graphs are effective for contrasting patterns of mobility in different
  countries or contrasting observed patterns against benchmarks of maximal or
  minimal mobility.
  
   
Additional information
   UK11_vankerm.pdf
Running multilevel models in MLwiN from within Stata: runmlwin
George Leckie
Centre for Multilevel Modelling, University of Bristol
Joint with Chris Charlton
  Multilevel analysis is the statistical modeling of hierarchical and
  nonhierarchical clustered data. These data structures are common in social
  and medical sciences. Stata provides the 
xtmixed, 
xtmelogit,
  and 
xtmepoisson commands for fitting multilevel models, but these are
  only relevant for univariate continuous, binary, and count response variables,
  respectively. A much wider range of multilevel models can be fit using
  the user-written 
gllamm command, but 
gllamm can be
  computationally slow for large datasets or when there are many random
  effects. Many Stata users therefore turn to specialist multilevel modeling
  packages such as MLwiN for fast fitting of a wide range of complex
  multilevel models. MLwiN includes the following features: fitting of
  multilevel models for 
n-level hierarchical and nonhierarchical data
  structures; fast fitting via classical and Bayesian methods; fitting
  of multilevel models for continuous, binary, ordered categorical, unordered
  categorical, and count data; fitting of multilevel multivariate response
  models, spatial models, measurement error models, multiple-imputation models,
  and multilevel factor models; interactive model equation windows and graph
  windows for model exploration; and availability that is free to academics
  in the United Kingdom. In this
  presentation, we will introduce the 
runmlwin command to fit
  multilevel models in MLwiN from within Stata and to return estimation
  results to the Stata environment. We shall demonstrate 
runmlwin in
  action with several example multilevel analyses in which we fit models and use
  Stata’s standard postestimation commands such as 
predict and
  
test to calculate predictions, perform hypothesis tests, and produce
  publication-quality graphics.
  
   
Additional information
   UK11_leckie.do
   UK11_leckie.pdf
Plagiarism in student papers and cheating in student exams: Results from surveys using special techniques for sensitive questions
Ben Jann
University of Bern
  Eliciting truthful answers to sensitive questions is an age-old problem in
  survey research. Respondents tend to underreport socially undesired or
  illegal behaviors while overreporting socially desirable ones. To combat
  such response bias, various techniques have been developed that are geared
  toward providing the respondent greater anonymity and minimizing the
  respondent’s feelings of jeopardy. Examples of such techniques are the
  randomized response technique, the item-count technique, and the crosswise
  model. I will present results from several surveys, conducted among
  university students, that employ such techniques to measure the prevalence
  of plagiarism and cheating in exams. User-written Stata programs for
  analyzing data from such techniques are also presented.
  
   Additional information
   UK11_jann.pdf
Lowering your handicap with Stata
Tim Collier
London School of Hygiene and Tropical Medicine
  When I first met Stata in October 2000, my golf handicap was 27 and my game
  was going nowhere slowly. Ten years of intensive Stata therapy later, my
  handicap is 17.3 and falling. It would, of course, be nonsense to infer from
  this data that lowering your handicap increases Stata use, but could the
  reverse be true? Could there be a causal relationship between increasing
  Stata use and a decreasing handicap? In this presentation, I argue that, yes,
  there is. Granted, Stata might not work along the lines of traditional golf
  training aids, but rather its effect is mediated through a third factor,
  namely time. Golf consumes time. Stata produces time. In this presentation, I
  will demonstrate how minutes in Stata’s programming world are
  equivalent to hours in the real world, and by the use of programs within
  programs, minutes can translate to days. Although extrapolation from an
  
N of 1 is nearly always dangerous, I believe that Stata could be
  similarly used to reduce your weight, improve foreign language skills, or
  even increase research output.
  
   
Additional information
   UK11_Collier.ppt
Fun and fluency with functions
Nicholas J. Cox
Durham University
  Functions in Stata range between those you know you want and those you
  don’t know you need. The word “functions” is heavily
  overloaded in Stata; here the focus is on functions’ strict sense,
  _variables, extended macro functions, and 
egen functions. Often Stata
  users in difficulty are seeking commands or imagining that they need to
  write programs, when a few lines of code using functions would crack their
  problem. In this talk, I will briefly give some general advice on using
  functions and in more detail discuss a variety of examples, with the aim of
  introducing something unappreciated but useful to almost everyone. Somehow
  or other, graphs and my own work will also be mentioned.
  
   
Additional information
   UK11_Cox_functions.html
   UK11_Cox_functions.smcl
Panel time-series modeling: New tools for analyzing xt data
Markus Eberhardt
University of Oxford
  Stata already has an extensive range of built-in and user-written commands
  for analyzing 
xt (cross-sectional time-series) data.
  However, most of these commands do not take into account important features
  of the data relating to their time-series properties or cross-sectional
  dependence. This talk reviews the recent literature concerned with these
  features with reference to the types of data in which they arise. Most of
  the talk will be spent discussing and illustrating various Stata commands
  for analyzing these types of data, including several new user-written
  commands. The talk should be of general interest to users of 
xt data
  and of particular interest to researchers with panel datasets in which
  countries or regions are the unit of analysis and there is also a
  substantial time-series element. Over the past two decades, a literature
  dedicated to the analysis of macro panel data has concerned itself with some
  of the idiosyncrasies of this type of data, including variable
  nonstationarity and cointegration, as well as with the investigation of
  possible parameter heterogeneity across panel members and its implications
  for estimation and inference. Most recently, this literature has turned its
  attention to concerns over cross-sectional dependence, which can arise either
  in the form of unobservable global shocks that differ in their impact
  across countries (for example, the recent financial crisis) or as spillover effects
  (again, unobservable) between a subset of countries or regions.
  
   
Additional information
   UK11_eberhardt.pdf
Scientific organizers
Stephen Jenkins, London School of Economics
  
Roger Newson, Imperial College London
Logistics organizers
  Timberlake Consultants, the official distributor
  of Stata in the United Kingdom, Brazil, Ireland, Poland, Portugal, and Spain.