Home  /  Resources & support  /  User Group meetings  /  2008 German Stata Users Group meeting

Last updated: 30 June 2008

2008 German Stata Users Group meeting

Friday, 27 June 2008

Victoria Statue

WZB Berlin (Wissenschaftszentrum Berlin für Sozialforschung)
Reichpietschufer 50
D-10785 Berlin-Tiergarten
Germany

Proceedings


Using instrumental variables techniques in economics and finance

Christopher F. Baum
Boston College Department of Economics and DIW Berlin
I will discuss the usefulness of instrumental variables (IV) techniques in addressing research questions in economics and finance. IV methods provide workable solutions to problems of endogeneity, measurement error and proxy variables, but they are easily misused. I will present a wide array of diagnostic techniques that should be employed to validate the use of IV in a particular context. I will also discuss the advantages of employing the Generalized Method of Moments form of IV (IV-GMM) and the Continuously Updated Estimator (GMM-CUE), and I will display some newly developed code that efficiently employs Stata's Mata programming language to implement the GMM-CUE.

Additional information
Baum.DESUG8621.beamer.pdf

Ordinal regression models: Problems, solutions, and problems with the solutions

Richard Williams
Notre Dame Department of Sociology
Ordered logit/probit models are among the most popular ordinal regression techniques. However, these models often have serious problems. The proportional odds/parallel lines assumptions made by these methods are often violated. Further, because of the way these models are identified, they have many of the same limitations as are encountered when analyzing standardized coefficients in OLS regression, e.g., interaction terms and crosspopulation comparisons of effects can be highly misleading. This paper shows how generalized ordered logit/probit models (estimated via gologit2) and heterogeneous choice/location scale models (estimated via oglm) can often address these concerns in ways that are more parsimonious and easier to interpret than is the case with other suggested alternatives. At the same time, the paper cautions that these methods sometimes raise their own concerns that researchers need to be aware of and know how to deal with. First, misspecified models can create worse problems than the ones these methods were designed to solve. Second, estimates are sometimes implausible, suggesting that the data are being spread too thin and/or yet another method is needed. Third, multiple and very different interpretations of the same results are often possible and plausible. I will present guidelines for identifying and dealing with each of these problems.

Additional information
GSUG2008-Handout.pdf
GSUG2008.pdf

Charts for comparing results between many categories

Ulrich Kohler
WZB
Charts are useful tools for comparing a statistic between groups defined by a categorical variable with many different categories. It has turned out from a number of postings on Statalist that Stata’s standard implementation of these graphs with graph dot and graph bar often limits the the users in their ambition to design such graphs. In most cases, however, users’ design wishes can be satisfied by reverting to the low-level command graph twoway. This tutorial talk demonstrates the construction of charts with graph twoway. We will start by reconstructing a simple bar chart with graph twoway and then move to a number of extensions that are possible when using graph twoway. I will illustrate some trickery with stored results and local macros, as well as a number of useful user-written programs.

Additional information
kohler.zip

Graph editing

Vince Wiggins
StataCorp
We will take a quick tour of the Graph Editor, covering the basic concepts: adding text, lines, and markers; changing the defaults for added objects; changing properties; working quickly by combining the contextual toolbars with the more complete object dialogs; and using the object browser effectively. Leveraging these concepts, we will discuss how and when to use the grid editor and techniques for combined and by-graphs. Finally, we will look at some tricks and features that are not apparent at first blush.

Relative distribution methods in Stata

Ben Jann
ETH Zürich
The concept of the relative density seems like a fruitful nonparametric approach to studying distributional differences between groups (Handcock and Morris 1999), yet it appears that the technique has gone more or less unnoticed in applied social science research. A scarcity of canned software might be one of the reasons the method is underutilized. Therefore, I present a new Stata command called reldist to plot the relative density, decompose distributional differences into location and shape effects, and compute relative distribution summary measures. The command is illustrated by an application comparing earnings by sex.

Reference:
Handcock, M. S., and M. Morris. 1999.
Relative Distribution Methods in the Social Sciences. New York: Springer.
Additional information
jann_reldist_berlin08.pdf

Direct and indirect effects in a logit model

Maarten Buis
Vrije Universiteit, Amsterdam
In this presentation, I discuss a method by Erikson et al. (2005) for decomposing a total effect in a logit model into direct and indirect effects, and I propose a generalization of this method. Consider an example where social class has an indirect effect on attending college through academic performance in high school. The indirect effect is obtained by comparing the proportion of lower-class students that attend college with the counterfactual proportion of lower-class students if they had the distribution of performance of the higher-class students. This captures the association between class and attending college because of differences in performance, i.e., the indirect effect. The direct effect of class is obtained by comparing the proportion of higher-class students with the counterfactual proportion of lower-class students if they had the same distribution of performance as the higher-class students. This way, the variable performance is kept constant, and this results in the direct effect. If these comparisons are carried out in the form of log odds ratios, then the total effect will equal the sum of the direct and indirect effects. In its original form, this method assumes that the variable through which the indirect effect occurs is normally distributed. In this article, the method is generalized by allowing this variable to have any distribution, which has the added advantage of simplifying the method.

Reference:
Erikson, R., J. H. Goldthorpe, M. Jackson, M. Yaish, and D. R. Cox. 2005.
On class differentials in educational attainment. Proceedings of the National Academy of Science 102(27): 9730–9733.
Additional information
Buis.pdf

Multiple imputation using ICE: A simulation study on a binary response

Jochen Hardt
Mathematical Statistics, Chalmers University, Göteborg, Sweden; Masters Programme, Bernstein Center for Computational Neuroscience, Berlin
Background: Various methods for multiple imputations of missing values are available in statistical software. They have been shown to work well when small proportions of missings were to be imputed. However, some researchers have started to impute large proportions of missings.
Method: We performed a simulation using ICE on datasets of 50/100/200/400 cases and 4/11/25 variables. A varying proportion of data (3–63%) were randomly set missing and subsequently substituted by multiple imputation.

Results: (1) It is shown when and how the algorithm breaks down by decreasing n of cases and increasing number of variables in the model. (2) Some unexpected results are demonstrated, e.g. flawed coefficients. (3) Compared to the second program that performs multiple imputations by chained equations, i.e., “mice” in “R”, the Stata program, “ice”, results in a slightly higher precision of the estimates by similar features of the program.
Conclusion: The imputation of missings by chained equations is a useful tool for imputing small to moderate proportions of missings. The replacement of larger amounts, however, can be critical.

Additional information
Hardt_missing5.ppt

Using Stata for a memory-saving fixed-effects estimation of the three-way error-components model

Thomas Cornelissen
Leibniz Universität Hannover
Researchers trying to estimate tens or hundreds of thousands of fixed effects for two or more groups (workers and firms; pupils, teachers and schools; etc.) in datasets with high numbers of observations are often limited by the size of computer memory available. Such a model is commonly estimated by sweeping out one of the effects by the fixed-effects transformation (time-demeaning) and by including the remaining effects as dummy variables. If K is the number of fixed effects to be included as dummy variables, and N is the number of observations, then the design matrix is of dimension N x K (neglecting any remaining right-hand-side regressors). The time-demeaned dummies have to be stored as “float” variables consuming 8 bytes per cell in Stata. For example, with 2 million observations (N) and 10 thousand fixed effects (K), the memory requirement would be 160 gigabytes. This paper describes how the memory requirement can be reduced to store only a K x K matrix, which in the given example reduces the memory requirement to below 1 gigabyte. The paper also describes the Stata program felsdvreg.ado, which implements the method in Mata. Besides implementing the memory-saving estimation method, the program also takes care of checking the identification of the effects and provides useful summary statistics.

Additional information
Cornelissen_2008_German_Stata_Meeting.pdf

Scientific organizers

Johannes Giesecke, University of Mannheim
[email protected]

Ulrich Kohler, WZB
[email protected]

Logistics organizers

The logistics are being organized by Dittrich and Partner (http://www.dpc.de), the distributor of Stata in several countries including Germany, The Netherlands, Austria, Czech Republic, and Hungary.