» Home » Resources & support » Users Group meetings » 2011 Nordic and Baltic Stata Users Group meeting

*Last updated: 1 December 2011*

Karolinska Institutet

CMB, Berzelius väg 21

Solna Campus

Stockholm, Sweden

Matteo Bottai

Unit of Biostatistics, Institute of Environmental Medicine,
Karolinska Institutet, Sweden

Multiple imputation is an increasingly popular approach for the analysis of
data with missing observations. It is implemented in Stata's
**mi** suite of commands. I present a new Stata command for
imputation of missing values based on prediction of conditional quantiles of
missing observations given the observed data. The command does not require
making distributional assumptions and can be applied to impute dependent,
bounded, censored, and count data.

**Additional information**

bottai_nordic11.pdf

bottai_nordic11.pdf

Maarten L. Buis

Institut fuer Soziologie, Universitaet Tuebingen, Germany

In this presentation, I aim to introduce graphical tools for comparing the
distribution of a variable in your dataset with a theoretical probability
distribution, like the normal distribution or the Poisson distribution. The
presentation will consist of two parts. In the first part, I will consider
univariate distributions, with a particular emphasis on hanging and suspended
rootograms (**hangroot**). Looking at univariate distributions
is not very common in a lot of (sub-(sub-))disciplines, but there are
situations where this can be very useful: For example, if we have a count of
accidents and we want to know whether these are occurring randomly, then we
can compare this variable with a Poisson distribution. Another example would
be simulations, where it is often the case that parameters or test statistics
should follow a certain distribution when the model that is being checked is
working as expected.

In the second part of the talk, I will focus on the more common situation where models assume a certain distribution for the explained/dependent/*y* variable, and I will estimate how one or more
parameters, often the mean, change when one or more
explanatory/independent/*x* variables change. The challenge now is
that the dependent variable no longer follows the theoretical distribution,
but rather a mixture of these theoretical distributions. In the case of a
linear regression, we can circumvent this difficulty by looking at the
residuals, which should follow a normal distribution. However, this
circumvention does not generalize to other models. I will show how to
graphically compare the distribution of the dependent variable with the
theoretical mixture distribution. The focus will be on a trick to sample new
dependent variables under the assumption that the model is true. Graphing the
distribution of the actual dependent variable together with these sampled
variables will give an idea of whether deviations from the theoretical
distribution could have occurred by chance. This idea will be applied to
checking the distributional assumption in beta regression
(**betafit**) and to choosing between different parametric
survival models (**streg**).

**Additional information**

buis_nordic11.pdf

In the second part of the talk, I will focus on the more common situation where models assume a certain distribution for the explained/dependent/

buis_nordic11.pdf

Michael J. Crowther

Department of Health Sciences,
University of Leicester, Leicester, United Kingdom

Paul C. Lambert

Department of Health Sciences, University of
Leicester, Leicester, United Kingdom and Department of Medical Epidemiology
and Biostatistics, Karolinska Institutet, Stockholm, Sweden

Simulation studies are essential for understanding and evaluating both
current and new statistical models. When simulating survival times, often an
exponential or Weibull distribution is assumed for the baseline hazard
function, but these distributions can be considered too simplistic and lack
biological plausibility in many situations. We will describe a new
user-written command, **survsim**, that allows the user to
simulate survival times from two-component mixture models, allowing much more
flexibility in the underlying hazard. Standard parametric models can also be
used, including the exponential, Weibull, and Gompertz models. Furthermore,
survival times can be simulated from the all-cause distribution of
cause-specific hazards for competing risks. A multinomial distribution is
used to create the event indicator, whereby the probability of experiencing
each event at a simulated time, *t*, is the cause-specific hazard
divided by the all-cause hazard evaluated at time *t*. Baseline
covariates and non-proportional hazards can be included in all scenarios.
Finally, we will discuss the complex extension of simulating joint
longitudinal and survival data.

**Additional information**

crowther_nordic11.pdf

crowther_nordic11.pdf

Andrea Discacciati

Unit of Biostatistics and Nutritional Epidemiology,
Institute of Environmental Medicine, Karolinska Institutet, Sweden

The **stci** official Stata command indirectly estimates
quantiles of the survival time for different exposure levels from the
Kaplan–Meier estimates. However, **stci** does not take
into account possible confounding effects. Therefore, we introduce a new
Stata command, **stqkm**, that indirectly estimates quantiles of
the survival time from inverse probability weighted Kaplan–Meier
estimates. Confidence intervals for the quantile estimates are obtained
using the bootstrap method. We present a simulation study to assess the
performances of the **stqkm** command in the presence of
confounding and we present a case study.

**Additional information**

discacciati_nordic11.pdf

discacciati_nordic11.pdf

Christel Häggström

Umeâ University, Sweden

Competing-risks analysis in epidemiology is of special importance in survival
analysis when studying the elderly and also when the exposure is related to
early death. In a cohort study, I investigated the association between
metabolic factors (obesity, hypertension, high glucose levels, etc.) and
prostate cancer (with mean age of diagnosis 70 years). Using this data, I
will present the analysis where I plotted cumulative incidence curves to
visualize the risk of prostate cancer in comparison with the competing-risks,
all-cause mortality for different levels of metabolic factors, using the
Stata commands **stcompet** and **stpepemori**. I
also used Fine and Gray regression (the **stcrreg** command) to
calculate hazard ratios of subdistribution for both prostate cancer incidence
and all-cause mortality.

**Additional information**

haggstrom_nordic11.pdf

haggstrom_nordic11.pdf

Peter Hedström

Institute for Futures Studies, Stockholm, Sweden

Thomas Grund

ETH, Zürich, Switzerland

Agent-based modeling (ABM) is an analytical tool that is becoming
increasingly important in the social sciences. The core idea behind ABM is
to use computational models to analyze the macro- or aggregate-level outcomes
that groups of agents, in interaction with one another, bring about. In this
presentation, we briefly discuss why ABM is important and show how Stata can
be used for such analyses. We also present a suite of programs. Some of these
commands are used for generating, visualizing, or measuring various
properties of the networks within which the agents are embedded, and others
are used for analyzing the collective outcomes that agents are likely to
bring about when embedded in such networks.

Nicola Orsini

Unit of Biostatistics and Nutritional Epidemiology,
Institute of Environmental Medicine, Karolinska Institutet, Sweden

I present an estimation command for Laplace regression to model conditional
quantiles of a response variable given a set of covariates. The
**laplace** command is similar to the official
**qreg** command except that it can account for censored data. I
illustrate its applicability and use through examples from health-related
fields.

**Additional information**

orsini_nordic11.pdf

orsini_nordic11.pdf

Sally R. Hinchliffe, Michael J. Crowther, Alison
Donald, and Alex J. Sutton

Department of Health Sciences, University of
Leicester, Leicester, United Kingdom

In this presentation, we describe a suite of programs (**metasim**,
**metapow**, **metapowplot**) that enable the
user to estimate the probability that the conclusions of a meta-analysis will
change with the inclusion of a new study(ies), as described previously by
Sutton et al. (2007). Using the
**metasim** program, we take a simulation approach to estimating the effects in
future studies. The method assumes that the effect sizes of future
studies are consistent with those observed previously, as represented by
the current meta-analysis. The contexts of both two-arm randomized
controlled trials and studies of diagnostic test accuracy are considered for
a variety of outcome measures. Calculations are possible under both fixed-
and random-effect assumptions, and several approaches to inference, including
statistical significance and limits of clinical significance, are possible.
Calculations for specific sample sizes can be conducted (using
**metapow**), and plots, akin to traditional power curves,
indicating the probability a new study(ies) will change inferences for a
range of sample sizes can be produced (using **metapowplot**).
Finally, plots of the simulation results are overlaid on a previously
described macro, **extfunnel**, which can help to intuitively
explain the results of such calculations of sample size. We hope the macro
will be useful to trialists who want to assess the impact potential new
trials will have on the overall evidence base and meta-analysts who want to
assess the robustness of the current meta-analysis to the inclusion of
future data.

Reference:

Sutton, A. J., N. J. Cooper, D. R. Jones, P. C. Lambert, J. R. Thompson, and K. R. Abrams. 2007. Evidence-based sample size calculations based upon updated meta-analysis.*Statistics in Medicine* 27: 471–490.

**Additional information**

hinchcliffe_nordic11.pdf

Reference:

Sutton, A. J., N. J. Cooper, D. R. Jones, P. C. Lambert, J. R. Thompson, and K. R. Abrams. 2007. Evidence-based sample size calculations based upon updated meta-analysis.

hinchcliffe_nordic11.pdf

Patrick Royston

MRC Clinical Trials Unit, United Kingdom

Quite a common task in Stata is to run some sequence of commands under the
control of a looping parameter and store the corresponding results in one
or more new variables. Over the years, I have written many such loops, some
of greater complexity than others. I finally became fed up with it and
decided to write a simple command to automate the repetitive parts. The
result is **looprun**, which I shall describe in this
presentation.

**Additional information**

royston_nordic11.ppt

royston_nordic11.ppt

Mark J. Rutherford, Paul C. Lambert, and John R. Thompson

Department of Health Sciences, University of
Leicester, Leicester, United Kingdom

Age–period–cohort models provide a useful method for modeling
cancer incidence and mortality rates. There is great interest in estimating
the rates of disease at given future time points so that plans can be made
for the provision of the required future services. In the setting of using
age–period–cohort models incorporating restricted cubic splines,
we propose a new technique for projecting incidence. The method is validated
via a comparison with existing methods in the setting of Finnish Cancer
Registry data. The reasons for the improvements seen in the newly proposed
method are twofold. First, improvements are seen because of the finer
splitting of the timescale to give a more continuous estimate of the
incidence rate. Second, the new method uses more-recent trends to dictate
the future projections than previously proposed methods. The output will be
produced via the user-written command **apcfit**. The
functionality of the command will be illustrated throughout the talk.

The talk will comprise an introduction of the use of restricted cubic splines for model fitting before describing their use for age–period–cohort models. A description of the new method for projecting cancer incidence will be given prior to showing the results of the application of the method to Finnish Cancer Registry data. The talk will conclude with a description of the potential problems and issues when making projections.

**Additional information**

rutherford_nordic11.pdf

The talk will comprise an introduction of the use of restricted cubic splines for model fitting before describing their use for age–period–cohort models. A description of the new method for projecting cancer incidence will be given prior to showing the results of the application of the method to Finnish Cancer Registry data. The talk will conclude with a description of the potential problems and issues when making projections.

rutherford_nordic11.pdf

Giola Santoni, Debora Rizzuto, and Laura Fratiglioni

Aging Research Center, Karolinska Institutet, Sweden

We want to quantify the protective effect of education on time to dementia
onset using a longitudinal data from a population study. We consider dropout
due to death of the subject as a competing event of the outcome of interest.
We show an adaptation of the Laplace regression method to the case of
competing-risks analysis. The first 20% percent of highly educated people will
develop dementia 2.5 years (p<.01) later than those with a lower education
level. The effect on all cause of mortality is negligible. We show that the
results derived through Laplace regression are comparable with those derived
with the Stata command **stcrreg**.

**Additional information**

santoni_nordic11.pdf

santoni_nordic11.pdf

Arvid Sjölander

Department of Medical Epidemiology and
Biostatistics, Karolinska Institutet, Sweden

Nicola Orsini

Units of Biostatistics and Nutritional Epidemiology,
Institute of Environmental Medicine, Karolinska Institutet, Sweden

The aim of epidemiological research is typically to estimate the association
between a particular exposure on a particular outcome, adjusted for a set of
additional covariates. This is commonly done by fitting a regression model
for the outcome, given exposure and covariates. If the regression model is
misspecified, then the resulting estimator may be inconsistent. Recently, a
new class of estimators has been developed, so called “doubly
robust” (DR) estimators. These estimators use two regression models:
one for the outcome and one for the exposure. A DR estimator is consistent if
either model is correct, not necessarily both. Thus DR estimators give the
analyst two chances instead of only one to make valid inference. In this
presentation, we describe a new package for Stata that implements the most
common DR estimators.

**Additional information**

sjolander_nordic11.pdf

sjolander_nordic11.pdf

Yulia Marchenko

StataCorp LP

I present the new Stata 12 command, **mi impute chained**, to
perform multivariate imputation using chained equations (ICE), also known as
sequential regression imputation. ICE is a flexible imputation technique
for imputing various types of data. The variable-by-variable specification
of ICE allows you to impute variables of different types by choosing the
appropriate method for each variable from several univariate imputation
methods. Variables can have an arbitrary missing-data pattern. By
specifying a separate model for each variable, you can incorporate certain
important characteristics, such as ranges and restrictions within a subset,
specific to each variable. I also describe other new features in multiple
imputation in Stata 12.

**Additional information**

marchenko_nordic11.pdf

marchenko_nordic11.pdf

Vince Wiggins

StataCorp LP

We will discuss SEM (structural equation modeling), not from the perspective
of the models for which it is most often used—measurement models,
confirmatory factor analysis, and the like—but from the perspective of
how it can extend other estimators. From a wide range of choices, we will
focus on extensions of mixed models (random and fixed-effects regression).
Extensions include conditional effects (not completely random), endogenous
covariates, and others.

**Additional information**

wiggins_nordic11.pdf

wiggins_nordic11.pdf

Peter Hedström, Metrika Consulting, Nuffield College and Oxford UniversityNicola Orsini, Karolinska Institutet

Matteo Bottai, Karolinska Institutet

Metrika Consulting, the official distributor of Stata in the Nordic and Baltic regions, and the Karolinska Institutet.