Home  /  Resources & support  /  Users Group meetings  /  2012 German Stata Users Group meeting

Last updated: 4 June 2012

2012 German Stata Users Group meeting

Friday, 1 June 2012

Statue of Victoria

WZB Social Science Research Center
Reichpietschufer 50
D-10785 Berlin


Handling interactions in Stata, especially with continuous predictors

Patrick Royston
University College London
Willi Sauerbrei
University of Freiburg
In an era in which doctors and patients aspire to personalized medicine, detecting and modeling interactions between covariates or between covariates and treatment is becoming increasingly important. In observational studies, for example, in epidemiology, interactions are known as effect modifiers; their presence can substantially change the understanding of how a risk factor impacts the outcome. However, modeling interactions in an appropriate and interpretable way is not straightforward.

In our talk, we consider three main topics. The first topic is the nuts and bolts of factor variables and interactions in Stata. We outline how Stata's parameterizations of interactions between factor variables work in regression models. The second topic is modeling interactions in observational studies that involve at least one continuous covariate, an area that practitioners apparently find difficult. We introduce a new Stata program, mfpigen, for detecting and modeling such interactions using fractional polynomials, adjusting for confounders if necessary. The third topic is modeling interactions between treatment and continuous covariates in randomized controlled trials. We outline a Stata program, mfpi, designed for this purpose. Key themes of our talk are the vital role played by graphical displays of interactions and the importance of applying simple plausibility checks.

Additional information

Exploratory spatial data analysis using Stata

Maurizio Pisati
University of Milano–Bicocca
In this talk, I will present the basic principles of exploratory spatial data analysis and their application using Stata. After a brief discussion of the specific features of spatial data, I will show some freely-available user-written Stata commands (spmap, spgrid, spkde, spatwmat, spatgsa, spatcorr, spatlsa) that help to carry out some exploratory analyses of real-world spatial data.

Additional information

leebounds: Lee’s treatment effect bounds for samples with nonrandom sample selection

Harald Tauchmann
Rheinisch-Westfäisches Institut für Wirtschaftsforschung
Even if assignment of treatment is purely exogenous, estimating treatment effects may suffer from severe bias if the available sample is subject to nonrandom sample selection/attrition. Lee (2009) addresses this issue by proposing an estimator for treatment effect bounds in the presence of nonrandom sample selection. In this approach, the lower and upper bound, respectively, correspond to extreme assumptions about the missing information that are consistent with the observed data. As opposed to conventional parametric approaches to correcting for sample selection bias, such as the classical heckit estimator, Lee bounds rest on very few assumptions, namely, random assignment of treatment and monotonicity. The latter means that treatment affects selection for any individual in the same direction. I introduce the new Stata command leebounds, which implements Lee’s bounds estimator in Stata. The command allows for several options, such as tightening bounds by the use of covariates, confidence intervals for the treatment effect, and statistical inference based on a weighted bootstrap. The command is applied to data gathered from a randomized trial of the effect of financial incentives on weight-loss among obese individuals.

Lee, David S. 2009. Training, wages, and sample selection: Estimating sharp bounds on treatment effects. Review of Economic Studies 76: 1071–1102.

Additional information

Comparing observed and theoretical distributions

Maarten Buis
University of Tübingen
In this talk, I aim to discuss tools to compare the observed distribution of a variable with the theoretical distribution assumed by a model. In particular, I will focus on the situation where a model assumes a certain distribution for the explained/dependent/y variable and one or more parameters of this distribution, often the mean, change when one or more explanatory/independent/x variables change. The challenge is that the dependent variable no longer follows the theoretical distribution, but rather follows a mixture of these theoretical distributions. In the case of a linear regression, we can circumvent this difficulty by looking at the residuals, which should follow a normal distribution. However, this circumvention does not generalize to other models. I will show the margdistfit package, which graphically compares the distribution of the dependent variable with the theoretical mixture distribution.

Additional information

A simple alternative to the linear probability model for binary choice models with endogenous regressors

Christopher F. Baum
Boston College and DIW Berlin
Yingying Dong
University of California Irvine
Arthur Lewbel
Boston College
Tao Yang
Boston College
Dong and Lewbel have developed the theory of simple estimators for binary choice models with endogenous or mismeasured regressors, depending on a “special regressor” as defined by Lewbel (2000). These estimators can be used with limited, censored, continuous, or discrete endogenous regressors and have significant advantages over the linear probability model. These estimators are numerically straightforward to implement.

We present and demonstrate an improved version of a Stata routine that provides both estimation and postestimation features, and we give a simple example where the linear probability model fails to estimate any useful quantity.

Lewbel, A. 2000. Semiparametric qualitative response model estimation with unknown heteroscedasticity and instrumental variables. Journal of Econometrics 97: 145–177.

Additional information

Robust regression in Stata

Ben Jann
University of Bern
Least-squares regression is a major workhorse in applied research. Yet its estimates may be deemed nonrobust under various conditions. One example is heavy-tailed error distributions, in which least-squares estimation may lose its cutting edge with respect to efficiency. More importantly, ordinary regression methods can produce biased results if the data are contaminated by a set of observations stemming from an alternative process. Various robust regression estimators have been proposed in the literature to address these problems, but they do not seem to be employed much in practical research. One reason for this underutilization may be a lack of convenient software implementations, as is exemplified by a close-to-complete absence of robust estimators from official Stata. In this talk, I will therefore present a number of user-written commands geared toward robust estimation of regression models.

Additional information

Working in the margins to plot a clear course

Bill Rising
StataCorp LP
Visualizing the true effect of a predictor over a range of values can be difficult for models that are not parameterized in their natural metric, such as for logistic or (even more so) probit models. Interaction terms in such models cause even more fogginess. In this talk, I show how both the margins and the marginsplot commands can make for much clearer explanations of effects for both nonstatisticians and statisticians alike.

Additional information

Can multilevel multiprocess models be estimated using Stata? A case for the cmp command

Tamás Bartus
Corvinus University of Budapest
Multilevel multiprocess models are routinely used to study parallel processes of repeated demographic events, like births, union formation, and union dissolution. Multilevel multiprocess models are simultaneous equations for hazards including heterogeneity components, and the joint estimation of hazard models allows researchers to control for the effects of unobserved personality traits. Such models are routinely estimated using MLwiN and aML. In this talk, I discuss the capabilities of Stata to estimate multiprocess multilevel models. In the presentation, I focus on the application of the user-written cmp command, developed by David Roodman (2007). The cmp command can estimate recursive systems of multilevel (random-effects) equations with correlated disturbances. I illustrate the application of the cmp command using examples from demographic research.

Roodman, D. 2007. cmp: Stata module to implement conditional (recursive) mixed process estimator. Statistical Software Components S456882, Department of Economics, Boston College. http://ideas.repec.org/c/boc/bocode/s456882.html.

Additional information

Rescaling results of mixed nonlinear probability models to compare regression coefficients or variance components across hierarchically nested models

Dirk Enzmann
University of Hamburg
Ulrich Kohler
Social Science Research Center Berlin
Because of the scaling of the unobserved latent dependent variable in logistic and probit multilevel models, the lowest level residual variance is always pi2/3 (logistic regression) or 1.0 (probit regression). As a consequence, a change of regression coefficients and variance components between hierarchically nested models cannot be interpreted unambiguously. To overcome this issue, rescaling of the unobserved latent dependent variable of nested models to the scale of the intercept-only model has been proposed (Hox 2010). In this talk, we demonstrate the use of the program meresc, which implements this procedure to rescale the results of mixed nonlinear probability models such as xtmelogit, xtlogit, or xtprobit.

Hox, J. J. 2010. Multilevel Analysis: Techniques and Applications. 2nd ed. New York: Rutledge.

Additional information

Multilevel tools

Katja Möhring
University of Cologne
Alexander Schmidt
University of Cologne
The Stata package “multilevel tools” (mlt) includes a range of ado-files for postestimation after multilevel models (xtmixed/xtmelogit). Up to now, it contains three commands (more ado-files will be added in the future):
  1. mltrsq gives the Boskers/Snijders R-square and the Bryk/Raudenbusch R-square values.
  2. mltcooksd gives the influence measures Cook’s D and DFBETAs for the higher-level units in hierarchical mixed models.
  3. mltshowm presents how the model looks if those cases detected as influential are excluded from the sample.
In our presentation, we will discuss the issue of influential cases in multilevel modeling. We will use some research examples to stress the importance of considering influential cases, particularly in multilevel analysis. We will show how the influence measures for second-level units are defined and how we calculate them.

Additional information

Modular programming in Stata

Daniel Schneider
University of Frankfurt/Main
Stata provides an easy and effective way of programming and distributing user-written additions to Stata’s command universe. However, a Stata programmer may face problems when trying to distribute an ado-file whose code in turn depends on one or many other self-written or third-party user-written routines. Distributing the ado-files as a package may not be appropriate, or it may be cumbersome in terms of compilation and maintenance.

The user-written command copycode facilitates code production, code certification, code maintenance, and code distribution in a context of extensive ado-file programming with many interdependencies among user-written files. Its main purpose is to assemble ado-files for distribution that are nondependent on other user-written files. It does so by copying the relevant code into one file. The programmer’s burden of keeping track of all first-order and higher dependencies is reduced to the compilation of a list of first-order dependencies, which is given to copycode as an input. copycode will then assemble a ready-to-distribute, nondependent ado-file that contains unique first-order and higher Stata subroutines and Mata code as private functions.

Additional information

Scientific organizers

Johannes Giesecke, University of Mannheim
[email protected]

Ulrich Kohler, WZB Social Science Research Center, Berlin
[email protected]

Logistics organizers

The conference is sponsored and organized by Dittrich and Partner (http://www.dpc.de), the distributor of Stata in several countries, including Germany, Austria, and Hungary.