The 2016 London Stata Users Group meeting was September 8–9, but you can still interact with the user community even after the meeting and learn more about the presentations shared.

## Proceedings

### Friday, September 9

 9:30–10:00 xtdpdqml: Quasi-maximum likelihood estimation of linear dynamic short-T panel-data models Abstract: In this presentation, I discuss the new Stata command xtdpdqml, which implements the unconditional quasi-maximum likelihood estimators of Bhargava and Sargan (1983, Econometrica 51: 1635–1659) for linear dynamic panel models with random effects and of Hsiao, Pesaran, and Tahmiscioglu (2002, Journal of Econometrics 109: 107–150) for linear dynamic panel models with fixed effects when the number of cross-sections is large and the time dimension is fixed. The marginal distribution of the initial observations is modeled as a function of the observed variables to circumvent a short-T dynamic panel-data bias. Robust standard errors are available following the arguments of Hayakawa and Pesaran (2015, Journal of Econometrics 188: 111–134). xtdpdqml also supports standard postestimation commands, including suest, which can be used for a generalized Hausman test to discriminate between the dynamic random-effects and the dynamic fixed-effects model. Additional information kripfganz_uk16.pdf Sebastian Kripfganz University of Exeter Business School 10:00–10:30 Distribution regression made easy Abstract: Incorporating covariates in (income or wage) distribution analysis typically involves estimating conditional distribution models, that is, models for the cumulative distribution of the outcome of interest conditionally on the value of a set of covariates. A simple strategy is to estimate a series of binary outcome regression models for $$F(z|x_i)= {\rm Pr}(y_i \le z |x_i)$$ for a grid of values for $$z$$ (Peracchi and Foresi, 1995, Journal of the American Statistical Association; Chernozhukov et al., 2013, Econometrica) This approach now often referred to as "distribution regression" is attractive and easy to implement. This talk illustrates how the Stata commands margins and suest can be useful for inference here and suggests various tips and tricks to speed up the process and solve potential computational issues. It also shows how to use conditional distribution model estimates to analyze various aspects of unconditional distributions. Philippe Van Kerm Luxembourg Institute of Socio-Economic Research 10:30–10:45 sdmxuse: Program to import statistical data within Stata using the SDMX standard Abstract: SDMX, which stands for Statistical Data and Metadata eXchange, is a standard developed by seven international organizations (BIS, ECB, Eurostat, IMF, OECD, the United Nations, and the World Bank) to facilitate the exchange of statistical data (https://sdmx.org/). The package sdmxuse aims at helping Stata users to download SDMX data directly within their favorite software. The program builds and sends a query to the statistical agency (using RESTful web services), then imports and formats the downloaded dataset (in XML format). Some initiatives, notably the SDMX connector by Attilio Mattiocco at the Bank of Italy (https://github.com/amattioc/SDMX), have already been implemented to facilitate the use of SDMX data for external users, but they all rely on the Java programming language. Formatting the data directly within Stata has proved to be quicker for large datasets, but it also offers a simpler way for users to address potential bugs. The last argument is of particular importance for a standard that is evolving relatively fast. The presentation will include an explanation of the functioning of the sdmxuse program as well as an illustration of its usefulness in the context of macroeconomic forecasting. Since the seminal work of Stock and Watson (2002), factor models have become widely used to compute early estimates (now-casting) of macroeconomic series (for example, Gross Domestic Product). More recent works (for example, Angelini et al. 2011) have shown that regressions on factors extracted from a large panel of time series outperform traditional bridge equations. But this trend has increased the need for datasets with many time series (often more than 100) that are updated immediately after new releases are made available (that is, almost daily). The package sdmxuse should be of interest for users wanting to work on the development of such models. Angelini, E., G. Camba-Mendez, D. Giannone, L. Reichlin, and G. Rünstler. 2011. Short-term forecasts of euro area GDP growth. Econometrics Journal 14: 25–44. Stock, J. H., and M. W. Watson. 2002. Forecasting using principal components from a large number of predictors. Journal of the American Statistical Association 97: 1167–1179. Additional information fontenay_uk16.pdf Sébastien Fontenay Institut de Recherches Économiques et Sociales, Université catholique de Louvain 11:15–12:15 Joint modeling of longitudinal and survival data Abstract: Joint modeling of longitudinal and survival-time data has been gaining more and more attention in recent years. Many studies collect both longitudinal and survival-time data. Longitudinal, panel, or repeated-measures data record data measured repeatedly at different time points. Survival-time or event history data record times to an event of interest such as death or onset of a disease. The longitudinal and survival-time outcomes are often related and should thus be analyzed jointly. Three types of joint analysis may be considered: 1) evaluation of the effects of time-dependent covariates on the survival time; 2) adjustment for informative dropout in the analysis of longitudinal data; and 3) joint assessment of the effects of baseline covariates on the two types of outcomes. In this presentation, I will provide a brief introduction to the methodology and demonstrate how to perform these three types of joint analysis in Stata. Additional information marchenko_uk16.pdf Yulia Marchenko StataCorp 12:15–12:45 stpm2cr: A Stata module for direct likelihood inference on the cause-specific cumulative incidence function within the flexible parametric modeling frame work Abstract: Modeling within competing risks is increasing in prominence as researchers are becoming more interested in real-world probabilities of a patient's risk of dying from a disease while also being at risk of dying from other causes. Interest lies in the cause-specific cumulative incidence function (CIF), which can be calculated by (1) transforming on the cause-specific hazards (CSH) or (2) through its direct relationship with the subdistribution hazards (SDH). We expand on current competing risks methodology within the flexible parametric survival modeling framework and focus on approach (2), which is more useful when we look to questions on prognosis. These can be parameterized through direct likelihood inference on the cause-specific CIF (Jeong and Fine 2006), which offers a number of advantages over the more popular Fine and Gray (1999) modeling approach. Models have also been adapted for cure models using a similar approach described by Andersson et al. (2011) for flexible parametric relative survival models. An estimation command, stpm2cr, has been written in Stata that is used to model all cause-specific CIFs simultaneously. Using SEER data, we compare and contrast our approach with standard methods and show that many useful out-of-sample predictions can be made after fitting a flexible parametric SDH model, for example, CIF ratios and CSH. Alternative link functions may also be incorporated such as the logit link leading to proportional odds models and models can be easily extended for time-dependent effects. We also show that an advantage of our approach is that it is less computationally intensive, which is important, particularly when analyzing larger datasets. References: Andersson, T. M-L., P. W.Dickman, S. Eloranta, and P. C. Lambert. 2011. Estimating and modelling cure in population-based cancer studies within the framework of flexible parametric survival models. BMC Medical Research Methodology 11(1): 96. doi: 10.1186/1471-2288-11-96. Fine, J. P., and R. J. Gray. 1999. A proportional hazards model for the subdistribution of a competing risk. Journal of the American Statistical Association 446: 496–509. Jeong, J-H., and J. P. Fine. 2006. Direct parametric inference for the cumulative incidence function. Applied Statistics 55: 187–200. Additional information islam_uk16.pdf Sarwar Islam University of Leicester Paul C. Lambert University of Leicester and Karolinska Institutet, Stockholm Mark J. Rutherford University of Leicester 1:45–3:00 Using simulation studies to evaluate statistical methods in Stata: A tutorial Abstract: Simulation studies are an invaluable tool for statistical research, particularly for the evaluation of a new method or comparison of competing methods. Simulations are well used by methodologists but often conducted or reported poorly, and are underused by applied statisticians. It's easy to execute a simulation study in Stata, but it's at least as easy to do it wrong. We will describe a systematic approach to getting it right, visiting the following: Types of simulation study An approach to planning yours Setting seeds and storing states Saving estimates with simulate and postfile Preparing for failed runs and trapping errors The three types of dataset involved in simulations Analysis of simulation studies Presentation of results (including Monte Carlo error) This tutorial will visit concepts, code, tips, tricks, and potholes, with the aim of giving the uninitiated the necessary understanding to start tackling simulation studies. Additional information morris_uk16.pdf Tim Morris MRC Clinical Trials Unit at UCL Ian White MRC Biostatistics Unit, Cambridge Michael Crowther University of Leicester 3:00–3:30 Reference-based multiple imputation for sensitivity analysis of clinical trials with missing data Abstract: The statistical analysis of longitudinal randomized clinical trials is frequently complicated by the occurrence of protocol deviations that result in incomplete datasets for analysis. However one approaches analysis, an untestable assumption about the distribution of the unobserved postdeviation data must be made. In such circumstances, it is important to assess the robustness of trial results from primary analysis to different credible assumptions about the distribution of the unobserved data. Reference-based multiple-imputation procedures allow trialists to assess the impact of contextually relevant qualitative missing data assumptions (Carpenter, Roger, and Kenward 2013). For example, in a trial of an active versus placebo treatment, missing data for active patients can be imputed following the distribution of the data in the placebo arm. I present the mimix command, which implements the reference-based multiple-imputation procedures in Stata, enabling relevant accessible sensitivity analysis of trial datasets. Carpenter, J.R., J. H. Roger, and M. G. Kenward. 2013. Analysis of longitudinal trials with protocol deviation: a framework for relevant, accessible assumptions, and inference via multiple imputation. Journal of Biopharmaceutical Statistics 23(6):1352–71. Additional information cro_uk16.pdf Suzie Cro MRC Clinical Trials Unit at UCL and London School of Hygiene and Tropical Medicine 4:00–4:30 Parallel computing in Stata: Making the most out of your desktop Abstract: Parallel computing has promised to deliver faster computing for everyone using off-the-shelf multicore computers. Despite proprietary implementation of new routines in Stata/MP, the time required to conduct computationally intensive tasks such as bootstrapping, simulation, and multiple imputation hasn't dramatically improved. One strategy to speed up computationally intensive tasks is to use distributed high performance computer clusters (HPC). Using HPCs to speed up computationally intensive tasks typically involves a divide and conquer approach. This simply divides repetitive tasks and distributes them across multiple processors and combines the results independently at the end of the process. The ability to access such clusters is limited; however, a similar system can be implemented on your desktop PC using the user-written command qsub. qsub provides a wrapper that writes, submits, and monitors jobs submitted to your desktop PC and that may dramatically improve the speed in which frequent computationally intensive tasks are achieved. Adrian Sayers Musculoskeletal Research Unit, University of Bristol

