Home  /  Resources & support  /  Users Group meetings  /  2013 German Stata Users Group meeting

Last updated: 8 June 2013

2013 German Stata Users Group meeting

Friday, 7 June 2013

Sanssouci Palace in Potsdam

University of Potsdam


Creating complex tables for publication

John Luke Gallup
Portland State University
Complex statistical tables often must be built up by parts from the results of multiple Stata commands. I show the capabilities of frmttable and outreg for creating complex tables, and even fully formatted statistical appendices, for Word and TeX documents. Precise formatting of these tables from within Stata has the same benefits as writing do-files for statistics commands. They are reproducible and reusable when the data change, saving the user time.

Additional information

An expanded framework for mixed process modeling in Stata

David Roodman
Center for Global Development
Roodman (Stata Journal, 2011) introduced the program cmp for using maximum likelihood to fit multiequation combinations of Gaussian-based models such as tobit, probit, ordered probit, multinomial probit, interval censoring, and continuous linear. This presentation describes substantial extensions to the framework and software: factor variable support; the rank-ordered probit model; the ability to specify precensoring truncation in most model types; hierarchical random effects and coefficients that are potentially correlated across equations; the ability to include the unobserved linear variables behind endogenous variables—not just their observed, censored manifestations—on the right side of other equations and, when so doing, the allowance for simultaneity in the system of equations. Contrary to the title of Roodman (2011), models no longer need be recursive or fully observed.

Additional information

Provide, Enrich, and Make Accessible: Using Stata’s Capabilities for Disseminating NEPS Scientific Use Data

Daniel Bela
National Educational Panel Study (NEPS), Data Center, University of Bamberg
The National Educational Panel Study (NEPS) is rising as one of Germany's major publisher of scientific use data for educational research. Disseminating data from six panel cohorts makes not only structured data editing but also documentation and user support a major challenge. In order to accomplish this task, the NEPS Data Center has implemented a sophisticated metadata system. It does not only allow the structured documentation of the metadata of survey instruments and data files. It also allows one to enrich the scientific use files with further information, thus significantly easing access for data analyses. As a result, NEPS provides bilingual dataset files (German and English) and allows the user to instantly see, for instance, the exact wording of the question leading to the data in a distinct variable without leaving the dataset. To achieve this, structured metadata is attached to the data using Stata's characteristics functionality. To make handling additional metadata even easier, the NEPS Data Center provides a package of user-written programs, NEPStools, to data users. The presentation will cover an introduction to the NEPS data preparation workflow, focusing on the metadata system and its role in enriching the scientific use data by using Stata's capabilities. Afterward, NEPStools will be introduced.

Additional information

newspell—Easy Management of Complex Spell Data

Hannes Neiss
German Institute for Economic Research
Biographical data gathered in surveys is often stored in spell format, allowing for overlaps between spell states. This gives useful information to researchers but leaves them with a very complex data structure, which is not easy to handle. I present my work on the ado-package newspell. It includes several subprograms for management of complex spell data. Spell states can be merged, reducing the overall number of spells. newspell allows a user to fill gaps with information from spells before and after the gap, given a user-defined preference. However, the two most important features of newspell are, first, the ability to rank spells and cut off overlaps according to the rank order. This is a necessary step before performing, for example, sequence analysis on spell data. Second, newspell can combine overlapping spells into new categories of spells, generating entirely new states. This is useful for cleaning data, for analyzing simultaneity of states, or for combining two spell datasets that have information on different kinds of states (for example, labor market and marital status). newspell is useful for users who are not familiar with complex spell data and have little experience in Stata programming for data management. For experienced users, it saves a lot of time and coding work.

Additional information

Instrumental variables estimation using heteroskedasticity-based instruments

Christopher F. Baum
Boston College
Arthur Lewbel
Boston College
Mark E. Schaffer
Heriot–Watt University, Edinburgh
Oleksandr Talavera
University of Sheffield
In a 2012 article in the Journal of Business and Economic Statistics, Arthur Lewbel presented the theory of allowing the identification and estimation of "mismeasured and endogenous regressor models" by exploiting heteroskedasticity. These models include linear regression models customarily estimated with instrumental variables (IV) or IV-GMM techniques. Lewbel's method, under suitable conditions, can provide instruments where no conventional instruments are available or augment standard instruments to enable tests of overidentification in the context of an exactly identified model. In this talk, I discuss the rationale for Lewbel's methodology and illustrate its implementation in a variant of Baum, Schaffer, and Stillman' sivreg2 routine, ivreg2h.

Additional information

Using simulation to inspect the performance of a test, in particular tests of the parallel regressions assumption in ordered logit and probit models

Maarten L. Buis
Social Science Research Center (WZB)
Richard Williams
University of Notre Dame
In this talk, we will show how to use simulations in Stata to explore to what extent and under what circumstances a test is problematic. We will illustrate this for a set of tests of the parallel regression assumption in ordered logit and probit models: the Brant, likelihood ratio, Wald, score, and Wolfe-Gould test of the parallel regression assumption. A common impression is that these tests tend to be too anti-conservative; that is, they tend to reject a true null hypothesis too often. We will use simulations to try to quantify when and to what extent this is the case. We will also use these simulations to create a more robust bootstrap variation of the tests. The purpose of this talk is twofold: first, we want to explore the performance of these tests. For this purpose, we will present a new program, oparallel, that implements all tests and their bootstrap variation. Second, we want to give more general advice on how to use Stata to create simulations when one has doubts about a certain test. For this purpose, we will present the simpplot command, which can help to interpret the p-values returned by such a simulation.

Additional information

Fitting Complex Mixed Logit Models with Particular Focus on Labor Supply Estimation

Max Löffler
Institute for the Study of Labor (IZA)
When one estimates discrete choice models, the mixed logit approach is commonly superior to simple conditional logit setups. Mixed logit models not only allow the researcher to implement difficult random components but also overcome the restrictive IIA assumption. Despite these theoretical advantages, the estimation of mixed logit models becomes cumbersome when the model’s complexity increases. Applied works therefore often rely on rather simple empirical specifications because this reduces the computational burden. I introduce the user-written command lslogit, which fits complex mixed logit models using maximum simulated likelihood methods. As lslogit is a d2-ML-evaluator written in Mata, the estimation is rather efficient compared with other routines. It allows the researcher to specify complicated structures of unobserved heterogeneity and to choose from a set of frequently used functional forms for the direct utility function—for example, Box-Cox transformations, which are difficult to estimate in the context of logit models. The particular focus of lslogit is on the estimation of labor supply models in the discrete choice context; therefore, it facilitates several computationally exhausting but standard tasks in this research area. However, the command can be used in many other applications of mixed logit models as well.

Additional information

Simulated Multivariate Random Effects Probit Models for Unbalanced Panels

Alexander Plum
Otto-von-Guericke University Magdeburg
This paper develops an implementation method of a simulated multivariate random-effects probit model for unbalanced panels, illustrating it by using artificial data. By mdraws, generated Halton draws are used to simulate multivariate normal probabilities with the command mvnp(). The estimator can be easily adjusted (for example, to allow for autocorrelated errors). Advantages of this simulated estimation are high accuracy and lower computation time compared with existing commands such as redpace.

Additional information

xsmle—A Command to Estimate Spatial Panel Models in Stata

Federico Belotti
University of Rome "Tor Vergata"
Gordon Hughes
University of Edinburgh
Andrea Piano Mortari
University of Rome "Tor Vergata"
Econometricians have begun to devote more attention to spatial interactions when carrying out applied econometric studies. The new command we are presenting, xsmle, fits fixed- and random-effects spatial models for balanced panel data for a wide range of specifications: the spatial autoregressive model, spatial error model, spatial Durbin model, spatial autoregressive model with autoregressive disturbances, and generalized spatial random effect model with or without a dynamic component. Different weighting matrices may be specified for different components of the models and both Stata matrices and spmat objects are allowed. Furthermore, xsmle calculates direct, indirect, and total effects according to Lesage (2008), implements Lee and Yu (2010) data transformation for fixed-effects models, and may be used with mi prefix when the panel is unbalanced.

Additional information

Estimating the dose-response function through the GLM approach

Barbara Guardabascio
Italian National Institute of Statistics, Rome
Marco Ventura
Italian National Institute of Statistics, Rome
How effective are policy programs with continuous treatment exposure? Answering this question essentially amounts to estimating a dose-response function as proposed in Hirano and Imbens (2004). Whenever doses are not randomly assigned but are given under experimental conditions, estimation of a dose-response function is possible using the Generalized Propensity Score (GPS). Since its formulation, the GPS has been repeatedly used in observational studies, and ad hoc programs have been provided for Stata users (doseresponse and gpscore, Bia and Mattei 2008). However, many applied works remark that the treatment variable may not be normally distributed. In this case, the Stata programs are not usable because they do not allow for different distribution assumptions other than the normal density. In this paper, we overcome this problem. Building on Bia and Mattei's (2008) programs, we provide doseresponse2 and gpscore, which allow one to accommodate different distribution functions of the treatment variable. This task is accomplished through by the application of the generalized linear models estimator in the first step instead of the application of maximum likelihood. In such a way, the user can have a very versatile tool capable of handling many practical situations. It is worth highlighting that our programs, among the many alternatives, take into account the possibility to consistently use the GPS estimator when the treatment variable is fractional, the flogit case by Papke and Wooldridge (1998), a case of particular interest for economists.

Additional information

Predictive Margins and Marginal Effects in Stata

Ben Jann
University of Bern
Tables of estimated regression coefficients, usually accompanied by additional information such as standard errors, t statistics, p-values, confidence intervals, or significance stars, have long been the preferred way of communicating results from statistical models. In recent years, however, the limits of this form of exposition have been increasingly recognized. For example, interpretation of regression tables can be very challenging in the presence of complications such as interaction effects, categorical variables, or nonlinear functional forms. Furthermore, while these issues might still be manageable in the case of linear regression, interpretational difficulties can be overwhelming in nonlinear models (for example, logistic regression). To facilitate sensible interpretation of these models, one must often compute additional results such as marginal effects, predictive margins, or contrasts. Moreover, smart graphical displays of results can be very valuable in making complex relations accessible. A number of helpful commands geared at supporting these tasks have been recently introduced in Stata, making elaborate interpretation and communication of regression results possible without much extra effort. Examples of these commands are margins, contrasts, and marginsplot. In my talk, I will discuss the capabilities of these commands and present a range of examples illustrating their use.

Additional information

Scientific organizers

Johannes Giesecke, University of Bamberg
[email protected]

Ulrich Kohler, University of Potsdam
[email protected]

Logistics organizers

The conference is sponsored and organized by Dittrich & Partner Consulting GmbH (http://www.dpc.de), the distributor of Stata in several countries, including Germany, The Netherlands, Austria, Czech Republic, and Hungary.