2012 German Stata Users Group meeting: Abstracts
Handling interactions in Stata, especially with continuous predictors
Patrick Royston
University College London
Willi Sauerbrei
University of Freiburg
In an era in which doctors and patients aspire to personalized medicine,
detecting and modeling interactions between covariates or between
covariates and treatment is becoming increasingly important. In
observational studies, for example, in epidemiology, interactions are known
as effect modifiers; their presence can substantially change the
understanding of how a risk factor impacts the outcome. However, modeling
interactions in an appropriate and interpretable way is not straightforward.
In our talk, we consider three main topics. The first topic is the nuts and
bolts of factor variables and interactions in Stata. We outline how Stata's
parameterizations of interactions between factor variables work in regression
models. The second topic is modeling interactions in
observational studies that involve at least one continuous covariate, an
area that practitioners apparently find difficult. We introduce a new Stata
program,
mfpigen, for detecting and modeling such interactions using
fractional polynomials, adjusting for confounders if necessary. The third
topic is modeling interactions between treatment and continuous covariates
in randomized controlled trials. We outline a Stata program,
mfpi,
designed for this purpose. Key themes of our talk are the vital role played
by graphical displays of interactions and the importance of applying simple
plausibility checks.
Additional information
desug12_royston.pdf
Exploratory spatial data analysis using Stata
Maurizio Pisati
University of Milano–Bicocca
In this talk, I will present the basic principles of exploratory spatial
data analysis and their application using Stata. After a brief discussion of
the specific features of spatial data, I will show some freely-available
user-written Stata commands (
spmap,
spgrid,
spkde,
spatwmat,
spatgsa,
spatcorr,
spatlsa) that help
to carry out some exploratory analyses of real-world spatial data.
Additional information
desug12_pisati.pdf
leebounds: Lee’s treatment effect bounds for samples with nonrandom sample selection
Harald Tauchmann
Rheinisch-Westfäisches Institut für Wirtschaftsforschung
Even if assignment of treatment is purely exogenous, estimating treatment
effects may suffer from severe bias if the available sample is subject to
nonrandom sample selection/attrition. Lee (2009) addresses this issue by
proposing an estimator for treatment effect bounds in the presence of
nonrandom sample selection. In this approach, the lower and upper bound,
respectively, correspond to extreme assumptions about the missing
information that are consistent with the observed data. As opposed to
conventional parametric approaches to correcting for sample selection bias,
such as the classical
heckit estimator, Lee bounds rest on very few
assumptions, namely, random assignment of treatment and monotonicity. The
latter means that treatment affects selection for any individual in the
same direction. I introduce the new Stata command
leebounds, which
implements Lee’s bounds estimator in Stata. The command allows for
several options, such as tightening bounds by the use of covariates,
confidence intervals for the treatment effect, and statistical inference
based on a weighted bootstrap. The command is applied to data gathered from
a randomized trial of the effect of financial incentives on weight-loss
among obese individuals.
Reference:
Lee, David S. 2009. Training, wages, and sample selection: Estimating sharp
bounds on treatment effects.
Review of Economic Studies 76:
1071–1102.
Additional information
desug12_tauchmann.pdf
Comparing observed and theoretical distributions
Maarten Buis
University of Tübingen
In this talk, I aim to discuss tools to compare the observed
distribution of a variable with the theoretical distribution assumed by a
model. In particular, I will focus on the situation where a model assumes a
certain distribution for the explained/dependent/y variable and one or
more parameters of this distribution, often the mean, change when one or
more explanatory/independent/x variables change. The challenge is that the
dependent variable no longer follows the theoretical distribution, but
rather follows a mixture of these theoretical distributions. In the case of a
linear regression, we can circumvent this difficulty by looking at the
residuals, which should follow a normal distribution. However, this
circumvention does not generalize to other models. I will show the
margdistfit package, which graphically compares the distribution of the
dependent variable with the theoretical mixture distribution.
Additional information
desug12_buis.zip
A simple alternative to the linear probability model for binary choice models with endogenous regressors
Christopher F. Baum
Boston College and DIW Berlin
Yingying Dong
University of California Irvine
Arthur Lewbel
Boston College
Tao Yang
Boston College
Dong and Lewbel have developed the theory of simple estimators for binary
choice models with endogenous or mismeasured regressors, depending on a
“special regressor” as defined by Lewbel (2000). These
estimators can be used with limited, censored, continuous, or discrete
endogenous regressors and have significant advantages over the linear
probability model. These estimators are numerically straightforward to
implement.
We present and demonstrate an improved version of a Stata routine that
provides both estimation and postestimation features, and we give a simple
example where the linear probability model fails to estimate any useful
quantity.
Reference:
Lewbel, A. 2000. Semiparametric qualitative response model estimation with
unknown heteroscedasticity and instrumental variables.
Journal of
Econometrics 97: 145–177.
Additional information
desug12_baum.pdf
Robust regression in Stata
Ben Jann
University of Bern
Least-squares regression is a major workhorse in applied research. Yet its
estimates may be deemed nonrobust under various conditions. One example is
heavy-tailed error distributions, in which least-squares estimation may
lose its cutting edge with respect to efficiency. More importantly,
ordinary regression methods can produce biased results if the data are
contaminated by a set of observations stemming from an alternative process.
Various robust regression estimators have been proposed in the literature
to address these problems, but they do not seem to be employed much in
practical research. One reason for this underutilization may be a lack of
convenient software implementations, as is exemplified by a close-to-complete
absence of robust estimators from official Stata. In this talk, I will
therefore present a number of user-written commands geared toward robust
estimation of regression models.
Additional information
desug12_jann.pdf
Working in the margins to plot a clear course
Bill Rising
StataCorp LP
Visualizing the true effect of a predictor over a range of values can be
difficult for models that are not parameterized in their natural metric,
such as for logistic or (even more so) probit models. Interaction terms in
such models cause even more fogginess. In this talk, I show how both the
margins and the
marginsplot commands can make for much clearer
explanations of effects for both nonstatisticians and statisticians alike.
Additional information
desug12_rising.pdf
Can multilevel multiprocess models be estimated using Stata? A case for the cmp command
Tamás Bartus
Corvinus University of Budapest
Multilevel multiprocess models are routinely used to study parallel
processes of repeated demographic events, like births, union formation, and
union dissolution. Multilevel multiprocess models are simultaneous equations
for hazards including heterogeneity components, and the joint estimation of
hazard models allows researchers to control for the effects of unobserved
personality traits. Such models are routinely estimated using MLwiN and aML.
In this talk, I discuss the capabilities of Stata to estimate multiprocess
multilevel models. In the presentation, I focus on the application of the
user-written
cmp command, developed by David Roodman (2007). The
cmp command can estimate recursive systems of multilevel
(random-effects) equations with correlated disturbances. I illustrate the
application of the
cmp command using examples from demographic
research.
Reference:
Roodman, D. 2007. cmp: Stata module to implement conditional (recursive)
mixed process estimator. Statistical Software Components S456882, Department
of Economics, Boston College.
http://ideas.repec.org/c/boc/bocode/s456882.html.
Additional information
desug12_bartus.pdf
Rescaling results of mixed nonlinear probability models to compare regression coefficients or variance components across hierarchically nested models
Dirk Enzmann
University of Hamburg
Ulrich Kohler
Social Science Research Center Berlin
Because of the scaling of the unobserved latent dependent variable in
logistic and probit multilevel models, the lowest level residual variance is
always pi
^{2}/3 (logistic regression) or 1.0 (probit regression). As
a consequence, a change of regression coefficients and variance components
between hierarchically nested models cannot be interpreted unambiguously. To
overcome this issue, rescaling of the unobserved latent dependent variable of
nested models to the scale of the intercept-only model has been proposed
(Hox 2010). In this talk, we demonstrate the use of the program
meresc, which implements this procedure to rescale the results of
mixed nonlinear probability models such as
xtmelogit,
xtlogit,
or
xtprobit.
Reference:
Hox, J. J. 2010.
Multilevel Analysis: Techniques and Applications. 2nd
ed. New York: Rutledge.
Additional information
desug12_enzmann.pdf
Multilevel tools
Katja Möhring
University of Cologne
Alexander Schmidt
University of Cologne
The Stata package “multilevel tools” (
mlt) includes a range of
ado-files for postestimation after multilevel models
(
xtmixed/
xtmelogit). Up to now, it contains three commands
(more ado-files will be added in the future):
- mltrsq gives the Boskers/Snijders R-square and the
Bryk/Raudenbusch R-square values.
- mltcooksd gives the influence measures Cook’s D and
DFBETAs for the higher-level units in hierarchical mixed models.
- mltshowm presents how the model looks if those cases
detected as influential are excluded from the sample.
In our presentation, we will discuss the issue of influential cases in
multilevel modeling. We will use some research examples to stress the
importance of considering influential cases, particularly in multilevel
analysis. We will show how the influence measures for second-level units
are defined and how we calculate them.
Additional information
desug12_moehring.pdf
Modular programming in Stata
Daniel Schneider
University of Frankfurt/Main
Stata provides an easy and effective way of programming and distributing
user-written additions to Stata’s command universe. However, a Stata
programmer may face problems when trying to distribute an ado-file whose
code in turn depends on one or many other self-written or third-party
user-written routines. Distributing the ado-files as a package may not be
appropriate, or it may be cumbersome in terms of compilation and maintenance.
The user-written command
copycode facilitates code production, code
certification, code maintenance, and code distribution in a context of
extensive ado-file programming with many interdependencies among
user-written files. Its main purpose is to assemble ado-files for
distribution that are nondependent on other user-written files. It does so
by copying the relevant code into one file. The programmer’s burden of
keeping track of all first-order and higher dependencies is reduced to the
compilation of a list of first-order dependencies, which is given to
copycode as an input.
copycode will then assemble a
ready-to-distribute, nondependent ado-file that contains unique first-order
and higher Stata subroutines and Mata code as private functions.
Additional information
desug12_schneider.pdf