Home  /  Resources & support  /  Users Group meetings  /  2011 German Stata Users Group meeting

Last updated: 19 July 2011

2011 German Stata Users Group meeting

Friday, 1 July 2011


Aula (auditorium) der Universität Bamberg
(Dominikanerbau), Dominikanerstr. 2a, 96049 Bamberg


Structural equation modeling using gllamm, confa, and gmm

Stas Kolenikov
University of Missouri–Columbia
In this talk, I introduce the main ideas of structural equation models (SEMs) with latent variables and Stata tools that can be used for such models. The two approaches most often used in applied work are numeric integration of the latent variables and covariance structure modeling. The first approach is implemented in Stata via gllamm, which was developed by Sophia Rabe-Hesketh. The second approach is currently implemented in confa for confirmatory factor analysis models. Also, introduction of the generalized method of moments (GMM) estimation and testing framework in Stata 11 make it possible to estimate SEMs by using moderately complex parameter and matrix manipulations. I provide working examples with some popular datasets (Holzinger–Swineford factor analysis model and Bollen’s industrialization and political democracy model).

Additional information

Evaluating one-way and two-way cluster–robust covariance matrix estimates

Mark E. Schaffer
Heriot-Watt University–Edinburgh
Although cluster–robust standard errors are now recognized as essential in a panel-data context, official Stata only supports clusters that are nested within panels. This rules out the possibility of defining clusters in the time dimension, and modeling contemporaneous dependence of panel units’ error processes. We build upon recent analytical developments that define two-way (and conceptually, n-way) clustering, and the 2010 implementation of two-way clustering in the widely used ivreg2 and xtivreg2 packages. We present examples of the utility of one-way and two-way clustering using Monte Carlo techniques, a comparison with alternative approaches to modeling error dependence, and consider tests for clustering of errors.

Implementation of a multinomial logit model with fixed effects

Klaus Pforr
Mannheim Center for European Social Research (MZES)
Fixed-effect models have become increasingly popular in the field of sociology. The possibility of controlling for unobserved heterogeneity makes these models a prime tool for causal analysis.

As of today, fixed-effects models have been derived and implemented for many statistical software packages for continuous, dichotomous, and count-data dependent variables, but there are still many important and popular statistical models, for which only population-average estimators are available, such as models for multinomial categorical dependent variables. In a seminal paper by Chamberlain (1980) such a model was derived. Possible applications would be analyses of effects on employment status with special consideration of part-time or irregular employment and analyses of the effects on voting behavior that impicitly control for longtime party identification rather than having to measure it directly. This model has not yet been implemented in any statistical software package.

In this presentation, I show a first version of an ado-file, that closes this gap. The implementation draws on the native Stata multinomial logit and conditional logit model implementations. The actual ml evaluator utilizes Mata functions to implement the conditional likelihood function. To show the numerical stability and computational speed of the implementation, comparison results with the built-in clogit are shown, as well as some basic results with simulated data.

Additional information

Plagiarism in student papers and cheating on exams: Results from surveys using special techniques for sensitive questions

Ben Jann
University of Bern
Eliciting truthful answers to sensitive questions is an age-old problem in survey research. Respondents tend to underreport socially undesired or illegal behaviors, while overreporting socially desirable ones. To combat such response bias, various techniques have been developed that are geared toward providing the respondent greater anonymity and minimizing the respondent’s feelings of jeopardy. Examples of such techniques are the randomized response technique, the item count technique, and the crosswise model. I will present results from several surveys, conducted among university students, that employ such techniques to measure the prevalence of plagiarism and cheating on exams. User-written Stata programs for analyzing data from such techniques are also presented.

Additional information

orderalpha: Nonparametric order-α efficiency analysis for Stata

Harald Tauchmann
Despite its frequent use in applied work, nonparametric approaches to efficiency analysis, namely data envelopment analysis (DEA) and free disposal hull (FDH), have bad reputations among econometricians. This is mainly due to DEA and FDH representing deterministic approaches that are highly sensitive to outliers and measurement errors. However, recently, so-called partial frontier approaches—namely order-m (Cazals, Florens, and Simar, 2002, Journal of Econometrics 106:1–25) and order-a (Aragon, Dauia, and Thomas-Agnan, 2005, Economic Theory 21: 358–389)—have been developed; they generalize FDH by allowing for super-efficient observations to be located beyond the estimated production-possibility frontier. Although these methods are purely nonparametric too, sensitivity to outliers is substantially reduced by partial frontier approaches enveloping just a subsample of observations. I present the new Stata command orderalpha that implements order-a efficiency analysis in Stata. The command allows for several options, such as statistical inference based on subsampling bootstrap. In addition, I present the accompanying Stata command oaoutlier, which is an explorative tool that employs orderalpha for detecting potential outliers in data meant for subsequent efficiency analysis using DEA.

Additional information

Investigating the effects of factor variables

Jeff Pitblado
Stata has a rich set of operators for specifying factor variables in linear and nonlinear regression models. I will show how to test for the effects of factor variables in these models. I will also show how to compare and contrast these effects using linear combinations of the model coefficients.

Additional information

Correlation metric

Kristian B. Karlson
Danish National Center for Social Research and the Center for Research in Compulsory Schooling
The logit model is a widely used regression technique in social research. However, the use and interpretation of coefficients from logit models have proven contentious. Problems arise because the mean and the variance of discrete variables cannot be separated. Logit coefficients are identified relative to an arbitrary scale, which makes the coefficients difficult both to interpret and to compare across groups or samples.

Do differences in coefficients reflect true differences or differences in scales? This cross-sample comparison problem raises concerns for comparative research. However, we suggest a new correlation metric, derived from logit models, which gives new interpretation to the estimates of logit models (log odds-ratios). The metric leads the way to a reorientation of the use of logit models, because it helps to clarify what logit coefficients are and how and when logit coefficients can (or cannot) be used in comparative research. The metric recovers the correlation between a predictor variable x and a continuous latent outcome variable y* assumed to underlie a binary observed outcome y. This metric is truly invariant to differences in the marginal distributions of x and y* across groups or samples, making it suitable for situations met in real applications in comparative research. Our derivations also extend to the probit and to ordered and multinomial models. The new metric is implemented in the Stata command nlcorr.

Additional information

Comparing coefficients between nested nonlinear probability models

Ulrich Kohler
In a series of recent articles, Karlson, Holm, and Breen have developed a method for comparing estimated coefficients of nested nonlinear probability models. The KHB method is a general decomposition method that is unaffected by the rescaling or attenuation bias that arises in cross-model comparisons in nonlinear models. It recovers the degree to which a control variable Z mediates or explains the relationship between X and a latent outcome variable Y* underlying the nonlinear probability model. It also decomposes effects of both discrete and continuous variables, applies to average partial effects, and provides analytically derived statistical tests. The method can be extended to other models in the generalized linear model family. This presentation describes this method and the user-written program khb that implements the method.

Additional information

SOEPlong: How to restructure complex longitudinal survey data (an application for the German Socio-Economic Panel study)

Arno Simons
Innovation in Governance Research Group, Technische Universität Berlin
Katja Möhring
Research Training Group SOCLIFE, University of Cologne
Peter Krause
German Socio-Economic Panel Study (SOEP), DIW
Currently, we observe in the social and behavioral sciences an increasing demand on complex longitudinal household survey data for national and cross-national analyses. The state of the art (for national as well as international comparative data collections) provides two types of solutions: either the full presentation of all original wave-specific variables over time or the creation of fixed variables according to common time-consistent standards. The first type of solution leaves it to the researcher to choose how to encapsulate differing categories over time, and thus, it is rather time-consuming. The second type of solution is very easy to use; however, it does not provide the user with information on possibly necessary annual extensions or modifications for specific years. In both cases, the researcher has no further information on potential changes of variables over time. This paper addresses the topic of how complex representative longitudinal data can be disseminated for analyses in the social and behavioral sciences such that the amount of time for data preparation is reduced to a minimum while information on consistency and changes of variables over time remains fully available. It turns out that if we want to monitor changes in living conditions by permanent, regular observations using panel surveys, adaptations in variables seem to be the rule rather than the exception. Therefore, our solution for the restructuring of longitudinal data fulfils the requirements of permanently ongoing adaptations in variables as a reflection of adapted measures according to new social conditions, new theoretical backgrounds, or improved conceptual measures when monitoring changes in living conditions directly over time.

Using Stata, we provide a conceptual and technical solution for how to restructure the full set of SOEP variables with a complete documentation of all adaptations over time. Our Stata programs generate two output files: one covering the restructured data and another one for the full documentation on the consistency of the variables over all waves. SOEPlong has been released in 2010 for the first time as a beta version, together with the usual data dissemination on DVD for the full set of SOEP variables for 26 waves of data. While the paper is specifically addressed to the German Socio-Economic Panel (SOEP) study, our general approach on how to deal with complex household panel data might well be applied to other national and cross-national longitudinal household surveys.

Additional information

Scientific organizers

Johannes Giesecke, University of Bamberg
[email protected]

Ulrich Kohler, WZB
[email protected]

Logistics organizers

The conference is sponsored and organized by Dittrich and Partner (http://www.dpc.de), the distributor of Stata in several countries, including Germany, Austria, and Hungary.