Christopher F. Baum

Boston College Department of Economics and DIW Berlin

I will discuss the usefulness of instrumental variables (IV) techniques in
addressing research questions in economics and finance. IV methods provide
workable solutions to problems of endogeneity, measurement error and proxy
variables, but they are easily misused. I will present a wide array of
diagnostic techniques that should be employed to validate the use of IV in a
particular context. I will also discuss the advantages of employing the
Generalized Method of Moments form of IV (IV-GMM) and the Continuously
Updated Estimator (GMM-CUE), and I will display some newly developed code that
efficiently employs Stata's Mata programming language to implement the
GMM-CUE.

**Additional information**

Baum.DESUG8621.beamer.pdf

Baum.DESUG8621.beamer.pdf

Richard Williams

Notre Dame Department of Sociology

Ordered logit/probit models are among the most popular ordinal regression
techniques. However, these models often have serious problems. The
proportional odds/parallel lines assumptions made by these methods are often
violated. Further, because of the way these models are identified, they have
many of the same limitations as are encountered when analyzing standardized
coefficients in OLS regression, e.g., interaction terms and crosspopulation
comparisons of effects can be highly misleading. This paper shows how
generalized ordered logit/probit models (estimated via **gologit2**) and
heterogeneous choice/location scale models (estimated via **oglm**) can often
address these concerns in ways that are more parsimonious and easier to
interpret than is the case with other suggested alternatives. At the same
time, the paper cautions that these methods sometimes raise their own
concerns that researchers need to be aware of and know how to deal with.
First, misspecified models can create worse problems than the ones these
methods were designed to solve. Second, estimates are sometimes implausible,
suggesting that the data are being spread too thin and/or yet another method
is needed. Third, multiple and very different interpretations of the same
results are often possible and plausible. I will present guidelines for
identifying and dealing with each of these problems.

**Additional information**

GSUG2008-Handout.pdf

GSUG2008.pdf

GSUG2008-Handout.pdf

GSUG2008.pdf

Ulrich Kohler

WZB

Charts are useful tools for comparing a statistic between groups defined by
a categorical variable with many different categories. It has turned out from
a number of postings on Statalist that Stata’s standard implementation
of these graphs with **graph dot** and **graph bar** often limits the
the users in their ambition to design such graphs. In most cases, however,
users’ design wishes can be satisfied by reverting to the low-level command
**graph twoway**. This tutorial talk demonstrates the construction of
charts with **graph twoway**. We will start by reconstructing a simple
bar chart with **graph twoway** and then move to a number of extensions
that are possible when using **graph twoway**. I will illustrate some
trickery with stored results and local macros, as well as a number of useful
user-written programs.

**Additional information**

kohler.zip

kohler.zip

Vince Wiggins

StataCorp

We will take a quick tour of the Graph Editor, covering the basic concepts:
adding text, lines, and markers; changing the defaults for added objects;
changing properties; working quickly by combining the contextual toolbars
with the more complete object dialogs; and using the object browser
effectively. Leveraging these concepts, we will discuss how and when to use
the grid editor and techniques for combined and by-graphs. Finally, we will
look at some tricks and features that are not apparent at first blush.

Ben Jann

ETH Zürich

The concept of the relative density seems like a fruitful nonparametric
approach to studying distributional differences between groups (Handcock and
Morris 1999), yet it appears that the technique has gone more or less
unnoticed in applied social science research. A scarcity of canned software
might be one of the reasons the method is underutilized. Therefore, I
present a new Stata command called **reldist** to plot the relative density,
decompose distributional differences into location and shape effects, and
compute relative distribution summary measures. The command is illustrated
by an application comparing earnings by sex.

**Reference:**

**Additional information**

jann_reldist_berlin08.pdf

- Handcock, M. S., and M. Morris. 1999.
*Relative Distribution Methods in the Social Sciences.*New York: Springer.

jann_reldist_berlin08.pdf

Maarten Buis

Vrije Universiteit, Amsterdam

In this presentation, I discuss a method by Erikson et al. (2005) for
decomposing a total effect in a logit model into direct and indirect effects,
and I propose a generalization of this method. Consider an example where
social class has an indirect effect on attending college through academic
performance in high school. The indirect effect is obtained by comparing the
proportion of lower-class students that attend college with the
counterfactual proportion of lower-class students if they had the
distribution of performance of the higher-class students. This captures the
association between class and attending college because of differences in
performance, i.e., the indirect effect. The direct effect of class is
obtained by comparing the proportion of higher-class students with the
counterfactual proportion of lower-class students if they had the same
distribution of performance as the higher-class students. This way, the
variable performance is kept constant, and this results in the direct effect.
If these comparisons are carried out in the form of log odds ratios, then the
total effect will equal the sum of the direct and indirect effects. In its
original form, this method assumes that the variable through which the
indirect effect occurs is normally distributed. In this article, the method
is generalized by allowing this variable to have any distribution, which has
the added advantage of simplifying the method.

**Reference:**

**Additional information**

Buis.pdf

- Erikson, R., J. H. Goldthorpe, M. Jackson, M. Yaish, and D. R. Cox. 2005.
- On class differentials in educational attainment.
*Proceedings of the National Academy of Science*102(27): 9730–9733.

Buis.pdf

Jochen Hardt

Mathematical Statistics, Chalmers University, Göteborg, Sweden;
Masters Programme, Bernstein Center for Computational Neuroscience, Berlin

Background: Various methods for multiple imputations of missing values are
available in statistical software. They have been shown to work well when
small proportions of missings were to be imputed. However, some researchers
have started to impute large proportions of missings.

Method: We performed a simulation using ICE on datasets of 50/100/200/400 cases and 4/11/25 variables. A varying proportion of data (3–63%) were randomly set missing and subsequently substituted by multiple imputation.

Results: (1) It is shown when and how the algorithm breaks down by decreasing n of cases and increasing number of variables in the model. (2) Some unexpected results are demonstrated, e.g. flawed coefficients. (3) Compared to the second program that performs multiple imputations by chained equations, i.e., “mice” in “R”, the Stata program, “ice”, results in a slightly higher precision of the estimates by similar features of the program.

Conclusion: The imputation of missings by chained equations is a useful tool for imputing small to moderate proportions of missings. The replacement of larger amounts, however, can be critical.

**Additional information**

Hardt_missing5.ppt

Method: We performed a simulation using ICE on datasets of 50/100/200/400 cases and 4/11/25 variables. A varying proportion of data (3–63%) were randomly set missing and subsequently substituted by multiple imputation.

Results: (1) It is shown when and how the algorithm breaks down by decreasing n of cases and increasing number of variables in the model. (2) Some unexpected results are demonstrated, e.g. flawed coefficients. (3) Compared to the second program that performs multiple imputations by chained equations, i.e., “mice” in “R”, the Stata program, “ice”, results in a slightly higher precision of the estimates by similar features of the program.

Conclusion: The imputation of missings by chained equations is a useful tool for imputing small to moderate proportions of missings. The replacement of larger amounts, however, can be critical.

Hardt_missing5.ppt

Thomas Cornelissen

Leibniz Universität Hannover

Researchers trying to estimate tens or hundreds of thousands of fixed
effects for two or more groups (workers and firms; pupils, teachers and
schools; etc.) in datasets with high numbers of observations are often
limited by the size of computer memory available. Such a model is
commonly estimated by sweeping out one of the effects by the fixed-effects
transformation (time-demeaning) and by including the remaining effects as
dummy variables. If K is the number of fixed effects to be included as
dummy variables, and N is the number of observations, then the design matrix
is of dimension N x K (neglecting any remaining right-hand-side regressors).
The time-demeaned dummies have to be stored as “float” variables
consuming 8 bytes per cell in Stata. For example, with 2 million
observations (N) and 10 thousand fixed effects (K), the memory requirement
would be 160 gigabytes. This paper describes how the memory requirement can
be reduced to store only a K x K matrix, which in the given example reduces
the memory requirement to below 1 gigabyte. The paper also describes the
Stata program felsdvreg.ado, which implements the method in Mata. Besides
implementing the memory-saving estimation method, the program also takes
care of checking the identification of the effects and provides useful
summary statistics.

**Additional information**

Cornelissen_2008_German_Stata_Meeting.pdf

Cornelissen_2008_German_Stata_Meeting.pdf