Home  /  Resources & support  /  Users Group meetings  /  2014 Germany

Last updated: 16 June 2014

2014 German Stata Users Group meeting

Friday, 13 June 2014


University of Hamburg
Van-Melle-Park 6 ("Philosophenturm")
Hörsaal E (ground floor)
Hamburg, Germany


Regression analysis of censored data using pseudoobservations

Erik T. Parner
Aarhus University, Denmark
In a series of papers, a method based on pseudovalues has been proposed for direct regression modeling of the survival function, the restricted mean, and the cumulative incidence function in competing risks with right-censored data. Once the pseudovalues have been computed, one can fit the models using standard generalized estimating equation software. In this talk, I will present three Stata procedures to compute these pseudoobservations and give examples of applications. I also present guidelines for the number of variables in the regression analyses and discuss future updates of the procedures.

Additional materials:

Using Stata for sequence analysis

Brendan Halpin
University of Limerick, Ireland

Sequence analysis (SA) is a very different way of looking at categorical longitudinal data, such as life-course or labor-market histories (or any ordered categorical data, for that matter). Instead of focusing on transition rates (for example, via hazard rate, Markov, or panel models), SA takes individual time series and compares them as a whole. It has significant advantages at a descriptive and exploratory level and may help detect patterns that conventional methods overlook. As availability of longitudinal data increases, this becomes a significant advantage.

SA hinges on defining measures of similarity between sequences, typically to generate data-driven classifications, for example by cluster analysis. Most SA uses the optimal matching distance, but other measures are used. There is some controversy about the applicability of SA algorithms to social science data and about their parameterization. Comparison of different methods and parameterizations helps clarify the issues.

For a long time, TDA was the only package social scientists had access to for SA, but in more recent years, both Stata and R have had relevant functionality; in Stata’s case, this is provided by the SQ and SADI packages.

In this talk, I will discuss the current state of the SADI package. SADI differs from SQ that is based on a plugin; therefore, it is significantly faster: many of the distance measures are computationally intensive, and typically, _N*(_N-1)/2 comparisons will be made. It also provides additional distance measures, including dynamic hamming, time-warp edit distance, and a version of Elzinga’s number of matching subsequences measure. It includes tools for inspecting and graphing sequence data and for comparing distance measures and the resulting cluster analyses.

I will also briefly discuss the advantages and disadvantages of using plugins rather than Mata and comment about cross-compiling plugins under Linux.

Additional materials:

Some examples using gsem to handle endogeneity in nonlinear models

David Drukker
StataCorp LP
Unobserved components can parameterize problems of endogeneity in many nonlinear models for cross-sectional and panel data. This talk provides some examples and uses gsem to estimate the parameters.

Additional materials:

Managing Stata-related files with dirtools

Ulrich Kohler
University of Potsdam, Germany
This presentation will illustrate several uses of the programs in the dirtools package. dirtools is a collection of programs designed to deal with native Stata files (that is, .dta, .do, .ado, .mata, or .gph) and some of the more frequently generated file formats (.eps, .pdf, and .tex). The programs provide easy access to typical tasks of these file types. For example, the user can describe or load datasets, compile Mata files, translate .gph files to .eps or .pdf files, and compile TeX files. In addition, the package allows an easy way to change the working directory and to define bookmarks for frequently used directories.

Additional materials:

Splitting spells in very large or daily datasets

Klaudia Erhardt
DIW Berlin, Germany
Ralf Künster
WZB Berlin, Germany
We have written two Stata programs (do-files) dealing with spell data. The first program, splitspells, splits spell data in such a way that every split spell within the same case is either parallel or unique (in respect to the time period covered by the spells). This could be done by splitting each spell in single time-unit splits, a method that is not recommended—or even feasible—for large datasets or daily data, because it produces a maximum of additional records. In contrast, splitspells does the job by producing the least possible number of additional records and is a useful tool in the process of transforming multiple spell-type datasets into “unidimensional” sequence data. The second program, combispells, is used with episode data that are already split. Using existing spell-type variables and user-defined short labels, combispells produces a (labeled) numeric variable and a string variable that show the spell-type combinations occurring within parallel splits of a case. Thus it provides a user-friendly and easy-to-handle tool for the editing, revision, and exploration of spell data.

Additional materials:

Modeling interactions in count-data regression: Principles and implementation in Stata

Heinz Leitgöb
University of Linz, Austria

During the past decades, count-data models (in particular, Poisson and negative-binomial-based regression models) have gained relevance in empirical social research. While identifying and interpreting main effects is relatively straightforward for this class of models, the integration of interactions between predictors proves to be complex. As a consequence of the exponential mean function implemented in count-data models (which restricts the possible range of the conditional expected count to nonnegative values), the coefficient of the product term variable (generated by the predictors constituting the interaction) does not—in contrast to the linear model—fully represent the underlying interaction effect. Further, the interaction effect is allowed to vary between individuals and can be divided into two components: a model-inherent interaction effect and a product-term-induced interaction effect.

We will derive the total interaction effect for the Poisson and negative binomial models by following a method developed by Norton and Ai (2003) for binary logit and probit models. Further, we will decompose the model-inherent and the product-term-induced interaction effect, discuss their substantive meaning, and provide delta-method standard errors for the respective effects. Finally, we will provide an approach for the estimation and graphical representation of these effects in Stata.


Ai, C. and E. C. Norton. 2003. Interaction terms in logit and probit models. Economics Letters. 80: 123–129.

Additional materials:

Estimating average treatment effects from observational data using teffects

David Drukker
StataCorp LP
After reviewing the potential-outcome framework for estimating treatment effects from observational data, this talk discusses how to estimate the average treatment effect and the average treatment effect on the treated using the regression-adjustment estimator, the inverse-probability-weighted estimator, two doubly robust estimators, and two matching estimators implemented in teffects.

Additional materials:

Reproducible research in Stata

Bill Rising
StataCorp LP

Writing a document that contains statistical results in its narrative, including inline results, can take too much effort. Typically, users have a separate series of do-files whose results must then be pulled into the document. This is a very high-maintenance way to work in because updates to the data, changes to the do-files, updates to the statistical software, and, especially, updates to inline results all require work and careful checking of results.

Reproducible research greatly lessens document-maintenance chores by putting code and results directly into the document; this means that only one document is used; thus it remains consistent and is easily maintained.

In this presentation, I will show you how to put Stata code directly into a LaTeX or HTML document and run it through a preprocessor to create the document containing results. While this is useful for creating self-contained documents, it is very useful for creating periodic reports, class notes, solution sets, and other documents that get used over a long period of time.

Additional materials:

A new command for plotting regression coefficients and other estimates

Ben Jann
University of Bern, Switzerland
Graphical display of regression results has become increasingly popular in presentations and the scientific literature because, in many cases, graphs are much easier to read than tables. In Stata, such plots can be produced by the marginsplot command. However, while marginsplot is very versatile and flexible, it has two major limitations: it can process only results left behind by margins, and it can handle only one set of results at a time. In this presentation, I will introduce a new command called coefplot, which overcomes these limitations. It plots results from any estimation command and combines results from several models into a single graph. The default behavior of coefplot is to plot markers for coefficients and horizontal spikes for confidence intervals. However, coefplot can also produce various other types of graphs. The capabilities of coefplot are illustrated using a series of examples.

Additional materials:

Scientific organizers

Johannes Giesecke, University of Bamberg
[email protected]

Ulrich Kohler, University of Potsdam
[email protected]

Dirk Enzmann, University of Hamburg
[email protected]

Kai-Uwe Schnapp, University of Hamburg
[email protected]

Logistics organizers

The conference is sponsored and organized by Dittrich & Partner Consulting GmbH (dpc-software.de), the distributor of Stata in several countries, including Germany, the Netherlands, Austria, Czech Republic, and Hungary.