2010 German Stata Users Group meeting: Abstracts
Biometrical modeling of twin and family data in Stata
University of California–Berkeley
Data on twins or on other types of family structures (for example, nuclear families,
siblings, cousins) can be used to estimate the proportion of variability
in observed traits (or phenotypes) that is due to genes. The
models are essentially multivariate regression models with residual
covariance structures dictated by Mendelian genetics. Usually, specialized
software for structural equation modeling is used. However, the required
covariance structures can also be produced using mixed models and by specifying
an appropriate design matrix for the random part of the model. Stata’s
command can then be used to estimate the models. For binary
phenotypes, such as diabetes, the appropriate probit models can be estimated
An introduction to matching methods for causal inference and
their implementation in Stata
Institute for Fiscal Studies
Matching, especially in its propensity-score flavors, has become an extremely
popular evaluation method. Matching is, in fact, the best-available method for
selecting a matched (or reweighted) comparison group that looks
like the treatment group of interest.
In this talk, I will introduce matching methods within the general problem of
causal inference, highlight their strengths and weaknesses, and offer a brief
overview of different matching estimators. Using psmatch2
, I will
then step through a practical example in Stata that is based on real data.
I will then show how to implement
some of these estimators, as well as highlight a number of
Heterogeneous treatment-effect analysis
Methods for causal inference and the estimation of treatment effects have
received much attention in recent years. Most of the methodological and
applied work focuses on the identification of so-called average treatment
effects, possibly restricted to the treated or the untreated.
However, treatment effects may vary (hence the averaging), and it can be
interesting to analyze the patterns of effect heterogeneity. In this talk, I
will present a new command called hte
that is used for heterogeneous
treatment-effect analysis in Stata. hte
first constructs balanced
propensity-score strata and, within each stratum, estimates the average
treatment effect. hte
then tests for a linear trend in effects across
the strata. The stratum-specific treatment effects and the estimated linear
trend are displayed in a two-way graph. hte
results from joint
work with Jennie E. Brand (UCLA) and Yu Xie (University of Michigan).
Estimation of linear fixed-effects models with individual-specific slopes in Stata
Mannheim Center for European Social Research (MZES)
Fixed-effects regression is considered a powerful method for estimating causal
effects with survey data. However, in the linear model, the conventional
technique of time-demeaning does not yield consistent estimates of the
parameters when unobserved heterogeneity is not time-constant. Jeffrey M. Wooldridge
(2002, Econometric Analysis of Cross
Section and Panel Data
[MIT Press], 317–322)
derived a general model for the situation where unobserved and observed
characteristics of individuals interact to produce the outcome. The
fixed-effects model with individual constants and slopes (FEIS) is a remedy
for coefficients that are biased due to, for example, maturation or learning where
unobserved traits affect individual growth curves differently for treated
The Stata xtfeis
command implements the FEIS estimator in Mata, allowing for
individual constants and (potentially many) slopes. Without specifying slope
variables, the model collapses to the conventional model estimated by xtreg,
that accounts for individual constants only. xtfeis
errors that are robust to serial correlation or heteroskedasticity of
unknown form. Estimates of the slope parameters are available optionally.
The command requires panel data with at least J + 1 observations per unit,
where J is the number of individual-specific slope variables (usually, but
not necessarily, also including the individual-specific constant). I will
present results for the effect of marriage on male wages based on real data
(GSOEP and NLSY) to demonstrate the practical relevance of the method.
I will use simulation results to assess robustness of the estimator to
autocorrelation, measurement error, and misspecification of functional form.
Generalized method of moments estimators in Stata
Stata 11 has a new command, gmm
, for estimating parameters by the
generalized method of moments (GMM). gmm
can estimate the parameters
of linear and nonlinear models for cross-sectional, panel, and time-series
data. In this presentation, I provide an introduction to GMM and to the
University of Tübingen
In this talk, I will discuss some techniques available in Stata for
analyzing dependent variables that are proportions. I will discuss four
, and fmlogit
first two deal with situations where we want to explain only one
proportion, while the latter two deal with situations where we have for
each observation multiple proportions that must add up to one. I will focus
on how to interpret the results of these models and on the
relative strengths and weaknesses of these models.
User-written Stata program: agrm
University of Mannheim
In the context of his research on perceptual agreement, Cees van der Eijk
(2001, Quality & Quantity
: 35, 325–341)
indicates that empirical measures that resort to the standard deviation of
the response distribution capture not only consensus but also skewedness.
Thus they are inappropriate measures of agreement. His alternative measure of
agreement, A, circumvents this problem and yields unbiased
figures for all kinds of ordered rating scales. It first decomposes the
frequency distribution into constituent layers, that is, row vectors for which
consensus can be unambiguously defined. It then computes the weighted
average degree of agreement. Given the lack of a corresponding ado-file, the
command allows you to directly calculate van der
Eijk’s index of agreement, A, in Stata. Aside from a broad
range of basic programming features such as low-level parsing and specifying
additional program options, argm
also entails more advanced techniques such as
handling empty categories and handling numerical missing values.
Moreover, it highlights the potential of nested loops and local macros in
the context of multiple permutations. Finally, the agrm
especially suited for showing how Stata’s matrix language, Mata,
provides a powerful environment for handling vectors and matrices.
Yet another program to create publication-quality tables
Institute of Sociology and Social Policy, Corvinus University
Stata users have developed several programs to create
publication-quality documents containing regression results (outreg,
outreg2, outtex, estout), tables of statistics
(tabout), and contents of matrices (outtable). So far, less
effort has been made to enable the easy publication of other kinds of tables,
such as those displaying the definitions of variables and summary statistics.
Although the sophisticated estout package can create tables other than
regression results, the underlying mechanism of posting results as if they
were estimation results has limitations, and removing these limitations
should involve additional programming.
The user-written command publish
(working title) is intended for users with limited knowledge in programming.
It creates publication-quality documents (HTML, MS Word, or LaTeX) that may
consist of tables displaying the following elements: definitions of variables, codebooks, summary
statistics, one-way and two-way frequencies, various
statistics, or estimation results. Users can create large tables where results are separately
shown for various subsamples or for several cross-tabulations with a common
dependent variable. Users can combine different sorts of elementary tables.
Users can also publish matrices of part of the data in
memory and create empty tables into which results from other
tables can be pasted. Controlling the layout of the table and the
column titles and supercolumn titles is also easily done using a small
number of common options.
RDS—a Stata program for respondent-driven sampling
DIW and Rand Corporation
Respondent-driven sampling (RDS) is a sampling technique typically employed
for hard-to-reach populations (for example, homeless people, people with AIDS, immigrants).
Briefly, initial seed respondents recruit additional respondents from their
network of friends. The recruiting process repeats iteratively, thereby
forming long referral chains. It is crucial to obtain estimates of
respondents’ network sizes (for example, the number of friends with the
characteristic of interest). RDS shares some similarities with snowball
sampling, but the theoretical foundation for inference using RDS samples is
much stronger. We will give a brief overview of this technique and
introduce a new user-written Stata command for RDS.
Report to the users
Bill Gould, president of StataCorp and head of development, talks about Stata.