David Drukker

StataCorp

Stata 11 has new command **gmm** for estimating parameters by the
generalized method of moments (GMM). **gmm** can estimate the
parameters of linear and nonlinear models for cross-sectional, panel, and
time-series data. In this presentation, I provide an introduction to GMM and to the
**gmm** command.

**Additional information**

dc09_drukker_gmm.pdf

dc09_drukker_gmm.pdf

David Roodman

Center for Global Development

At the heart of many econometric models is a linear function and a normal
error. Examples include the classical small-sample linear regression model
and the probit, ordered probit, multinomial probit, tobit, interval
regression, and truncated distribution regression models. Because the normal
distribution has a natural multidimensional generalization, such models can
be combined into multiequation systems in which the errors share a
multivariate normal distribution. The literature has historically focused on
multistage procedures for estimating mixed models, which are more efficient
computationally, if less so statistically, than maximum likelihood (ML). But
faster computers and simulated likelihood methods such as the Geweke,
Hajivassiliou, and Keane (GHK) algorithm for estimating higher-dimensional
cumulative normal distributions have made direct ML estimation practical. ML
also facilitates a generalization to switching, selection, and other models
in which the number and types of equations vary by observation. The Stata
module **cmp** fits seemingly unrelated regressions models of this
broad family. Its estimator is also consistent for recursive systems in
which all endogenous variables appear on the right-hand sides as observed.
If all the equations are structural, then estimation is full-information
maximum likelihood. If only the final stage or stages are structural, then it is
limited-information maximum likelihood. **cmp** can mimic a dozen
built-in Stata commands and several user-written ones. It is also
appropriate for a panoply of models previously hard to estimate.
Heteroskedasticity, however, can render it inconsistent. In this
presentation, I explain the theory and implementation of **cmp** and of a
related Mata function, **ghk2()**, that implements the GHK algorithm.

**Additional information**

dc09_roodman.ppt

dc09_roodman.ppt

David Drukker

StataCorp

Stata 11 has new commands **sspace** and **dvech** for estimating the
parameters of space-space models and diagonal-vech multivariate GARCH models,
respectively. In this presentation, I provide an introduction to space-space models,
diagonal-vech multivariate GARCH models, the implemented estimators, and the
new Stata commands.

**Additional information**

dc09_drukker_mvts.pdf

dc09_drukker_mvts.pdf

Jeff Pitblado

StataCorp

In this presentation, I cover how to use Stata for survey data analysis assuming a fixed
population. We will begin by reviewing the sampling methods used to collect
survey data, and how they affect the estimation of totals, ratios, and
regression coefficients. We will then cover the three variance estimators
implemented in Stata's survey estimation commands. Strata with a single
sampling unit, certainty sampling units, subpopulation estimation, and
poststratification will be also covered in some detail.

**Additional information**

dc09_pitblado_svy.pdf

dc09_pitblado_svy.pdf

Rick Valliant

University of Maryland

Diagnostics for linear regression models are included as options in Stata
and many other statistical packages and are now readily available to
analysts. However, these tools are generally aimed at ordinary or weighted
least-squares regression and do not account for stratification, clustering,
and survey weights that are features of datasets collected in complex
sample surveys. The ordinary least-squares diagnostics can mislead users
because the variances of model parameter estimates will usually be estimated
incorrectly by the standard procedures. The variance or standard-error
estimates are an intimate part of many diagnostics. In this presentation, I
summarize research that has been done to extend some of the existing
diagnostics to complex survey data. Among the linear regression techniques
I cover are leverages, DFBETAS, DFFITS, the forward search method for
identifying influential points, and collinearity diagnostics, like variance
inflation factors and variance decompositions.

**Additional information**

dc09_valliant.pdf

dc09_valliant.pdf

Brady West

University of Michigan

In this presentation, I provide an overview of important considerations that
analysts of large public-use survey datasets must keep in mind when
attempting to make inferences for finite subpopulations of research
interest. I will discuss several examples of possible subpopulation analysis approaches
that analysts could take using the Stata **svy:** commands, and I will
emphasize the implications of each approach for making inferences.
Participants will have time for a question-and-answer session
building upon the examples.

**Additional information**

dc09_west.ppt

dc09_west.ppt

Christopher F. Baum

Boston College

I will discuss how econometric estimators may be efficiently programmed in Mata.
The prevalence of matrix-based analytical derivations of estimation
techniques and the computational improvements available from just-in-time
compilation combine to make Mata the tool of choice for econometric
implementation. I will give two examples: computing the seemingly unrelated
regression estimator for an unbalanced panel, a multivariate linear
approach, and computing the continuously updated GMM estimator (GMM-CUE) for
a linear instrumental-variables model. The GMM-CUE estimator makes use of
Mata’s optimize suite of functions. Both illustrate the power and
effectiveness of a Mata-based approach.

**Additional information**

dc09_baum.pdf

dc09_baum.pdf

Paulo Guimaraes

University of South Carolina

In this presentation, I describe an alternative iterative approach for the
estimation of linear regression models with high-dimensional fixed-effects,
such as large employer–employee datasets. This approach is computationally
intensive but imposes minimum memory requirements. I also show that the
approach can be extended to nonlinear models and potentially to more than
two high-dimensional fixed effects. Note: The presentation is based on a
paper that is currently under review at the *Stata Journal*.

**Additional information**

dc09_guimaraes.pdf

dc09_guimaraes_examples.zip

dc09_guimaraes.pdf

dc09_guimaraes_examples.zip

Choonjoo Lee

Ji Yong-bae

Korea National Defense University

In this presentation, we present a procedure and an illustrative application of a
user-written Data Envelopment Analysis (DEA) program in Stata. DEA is a
linear programming method for assessing the efficiency and productivity of
units and a popular managerial tool for measuring performance of organizations.
It has been used widely for assessing the efficiency of public and
private sectors, such as banks, airlines, hospitals, universities, defense
firms, and manufacturers. The DEA program in Stata will allow DEA users to
easily access the Stata system and to conduct not only the standard
optimization procedure but also more extended managerial analysis. The Mata
programming, an extension of the DEA program code developed in the Stata
programming language, will be discussed for the cases where the data
capacity matters. We will also discuss the returns to scale options in
DEA. Unfortunately, to date no DEA options are available in Stata, but an
SFA model is available. The user-written DEA approach in Stata will provide
some possible future extensions of Stata programming in DEA.

**Additional information**

dc09_lee_ji.pdf

dc09_lee_ji.ppt

dc09_lee.zip

dc09_lee_ji.pdf

dc09_lee_ji.ppt

dc09_lee.zip

Hoa Nguyen

Michigan State University

Minh Nguyen

American University

In this presentation, we introduce the command **frcount** for estimating
the fractional response model with an endogenous count variable. The
endogeneity of the right-hand-side count variable is controlled for under
the presence of unobserved heterogeneity. We briefly discuss the model,
estimation method, and implementation of the **frcount** command in
Stata. More importantly, we provide useful summary statistics of parameter
estimates, adjusted standard errors, and average partial effects, which can
be comparable among nonlinear models.

**Additional information**

dc09_nguyen.pdf

dc09_nguyen.pdf

Mei-Ling Ting Lee

University of Maryland

In this presentation, I introduce a new Stata command called **threg**. The
command estimates regression coefficients of a threshold regression model
based on the first hitting time of a boundary by the sample path of a Wiener
diffusion process. The regression methodology is well suited to applications
involving survival and time-to-event data. This new command uses the MLE
routine in Stata for calculating regression coefficient estimates,
asymptotic standard errors, and *p*-values. An initialization option is
also allowed, as in the conventional MLE routine. The **threg** command
can be carried out with either calendar or analytical time scales.
Hazard ratios at selected time points for specified
scenarios (based on given categories or value settings of covariates) can
also be calculated by this command. Furthermore, curves of estimated hazard
functions, survival functions, and probability distribution functions of the
first hitting time can be plotted. Function curves corresponding to
different scenarios can be overlaid in the same plot for a comparative
analysis to give added research insights.

**Additional information**

dc09_lee_intro.pdf

dc09_lee_intro.pdf

Austin Nichols

Urban Institute

In this presentation, I provide a brief overview of quasiexperimental
methods of estimating causal impacts using Stata: panel data, matching and
reweighting, instrumental variables, and regression discontinuity designs,
emphasizing practical considerations. I pay particular attention to the
regression discontinuity method, which is the least widely known but the
most well regarded of the quasiexperimental methods in those circumstances
where it is appropriate.

**Additional information**

dc09_nichols.pdf

dc09_nichols.pdf

Jeff Pitblado

StataCorp

In this presentation, I cover how to use the new factor variables features in Stata 11.
Stata’s new factor variables notation allows you to identify categorical
covariates as factor variables, provides a convenient notation for specifying
indicator variables without having to generate them, and allows interactions of
factor variables with other factor variables or continuous covariates.

We will also cover the new**margins** postestimation command.
**margins** is a powerful yet easy-to-use command for computing expected
marginal means, predictive margins, adjusted predictions, average marginal
effects, and conditional marginal effects. Standard errors in **margins**
can be estimated conditionally on the observed/specified covariate values or
unconditionally via linearization.

**Additional information**

dc09_pitblado_fv.zip

We will also cover the new

dc09_pitblado_fv.zip

Nicholas J. Cox

Durham University (UK)

The display of data or of results often entails the preparation of a variety
of table-like graphs showing both text labels and numeric values. I will
present basic techniques, tips, and tricks using both official Stata and
various user-written commands. The main message is that whenever **graph
bar**, **graph dot**, or **graph box** commands fail to give what
you want, then you can knit your own customized displays using **twoway**
as a general framework.

**Additional information**

dc09_cox.zip

dc09_cox.zip

Bill Rising

StataCorp

There are many different ways to work in Stata depending on your desires: You
can work using the menus, dialog boxes, Command window, or via the Do-file
Editor. Stata 11 adds to this list with its new Variables Manager and much-improved
Data Editor, both of which provide tools that make tasks such as managing
value labels or entering and editing dates much easier. I will show off these
new features and explain how they can be used to produce do-files for
reproducibility through the use of command logs and the improved Do-file
Editor.

**Additional information**

dc09_rising.pdf

dc09_rising.pdf

Michael Lokshin

The World Bank

I will present and discuss the development of the large software project
ADePT, which combines the computation kernel of Stata and the user interface
written in C#. ADePT is a software platform for applied economic analysis.
It is used widely in the World Bank and in many research institutions
around the world to produce a standardized set of tables and graphs in
different areas of applied economic analysis. Currently, ADePT includes
modules on poverty, labor market, inequality, gender, education, social
protection, and health.

I will demonstrate various stages of the project development, discuss the software routines (both Stata and C#) developed for interaction between ADePT and Stata, and demonstrate various tools we developed in Stata and C#. Many of these routines are currently available for Stata users.

**Additional information**

dc09_lokshin.ppt

I will demonstrate various stages of the project development, discuss the software routines (both Stata and C#) developed for interaction between ADePT and Stata, and demonstrate various tools we developed in Stata and C#. Many of these routines are currently available for Stata users.

dc09_lokshin.ppt

Masahiko Aida

Greenberg Quinlan Rosner

In U.S. political campaigns, the use of propensity scores of voters, predicted
attributes, such as partisanship or turnout likelihood, became quite popular
in recent years. Such applications, often called microtargeting, range
from survey sampling to voter contacts via direct mail, phone, or canvassing.
To create such models, analysts first recode the original dataset into
statistical software and then create statistical models by using data mining
tools. When the mining models are validated against validation data, then
analysts need to append propensity scores with a database of millions of
voters (such databases typically contain information from voter files, census
data, and consumer data). While database software offers a strong capacity to
store and manipulate a large volume of data, carrying out basic data
transformation such as recoding or creating an index by PCA is not easy using
database software. I will demonstrate an example of using Stata as a
front-end tool to connect to database software, calculate propensity scores
using a C++ plug-in, and return the propensity scores back to the database. This
approach combines the strengths of three different platforms: the flexibility of
Stata as a general statistical package, the speed of C++ to conduct complex
calculations, and the capacity of database software to manipulate gigabytes of
data with relative ease.

**Additional information**

dc09_aida.ppt

dc09_aida.pdf

dc09_aida.ppt

dc09_aida.pdf

Chuck Huber

Texas A&M Health Science Center School of Rural Public Health

Abstract genetic association studies often explore the relationship between
diseases and collections of contiguous genetic markers located on the same
chromosome (known as haplotypes). Haplotypes are usually not observed directly
but are inferred statistically using a variety of algorithms. One of the
most popular haplotype inference programs is PHASE (Stephens and Scheet 2005;
Stephens, Smith, and Donnelly 2001) and one of the most popular programs for
examining characteristics of the resulting haplotypes is HaploView (Barrett,
et al. 2005). I will present a set of Stata commands for exporting genotype
data from Stata into PHASE, importing the resulting haplotypes back into
Stata for association analysis, and exporting the haplotype data from Stata
into HaploView for further exploration.

**References**
**Additional information**

dc09_huber.ppt

- Barrett, J. C., B. Fry, J. Maller, and M. J. Daly. 2005.
- Haploview: Analysis and visualization of LD and haplotype maps.
*Bioinformatics*21: 263–265.

- Stephens, M., and P. Scheet. 2005.
- Accounting for decay of linkage disequilibrium in haplotype inference
and missing-data imputation.
*American Journal of Human Genetics*76: 449–462.

- Stephens, M., N. J. Smith, and P. Donnelly. 2001.
- A new statistical method for haplotype reconstruction from population data.
*American Journal of Human Genetics*68: 978–989.

dc09_huber.ppt

Ben Dwamena

University of Michigan

Meta-analysis of diagnostic accuracy studies may be performed to provide a
summary measure of diagnostic accuracy based on a collection of studies and
their reported empirical or estimated smooth ROC curves. Statistical
methodology for meta-analysis of diagnostic accuracy studies has largely
been focused on the most common type of studies—those reporting estimates
of test sensitivity and specificity. To meta-analyze studies with results in
more than two categories, one approach is to dichotomize results by grouping
them into two categories and then employing one of such methods. However,
it is more efficient to take all thresholds into account. Existing methods
require the same number and set of categories/thresholds, are
computationally intensive adaptations of the binary methods, or are only
implementable using Bayesian inference. In this presentation, I present a
robust and flexible parametric algorithm that is invariant to the
number and set of categories and is implementable with standard statistical
software such as Stata, SPSS, or SAS. The method consists of 1) estimation
of study-specific ROC and location-scale parameters by heteroskedastic
ordinal (probit or logit) regression; 2) estimation of correlated or
uncorrelated mean location and scale from study-specific estimates with
linear mixed modeling by ML, REML, or method of moments; and 3) estimation
of summary ROC (bilogistic versus binormal) and ROC functionals with mean
location and scale estimates from step 2. The method is illustrated with two
datasets (one with studies reporting the same set of categories and the other
with disparately categorized outcomes). Steps 1 and 2 are performed with
**oglm** (authored by Richard Williams) and **mvmeta** (authored by Ian
White) respectively. The proposed meta-analytical algorithm may be
implemented in Stata by using the **midacat** module.

**Additional information**

dc09_dwamena.pdf

dc09_dwamena.pdf

Stas Kolenikov

University of Missouri

Statisticians routinely use Monte Carlo methods to simulate random data and
run new estimation procedures on those simulated data. How about simulating
data for students to use in their homework? Each student gets a unique copy
of a dataset, which serves at least two purposes. First, each student has to
interact with the software and interpret their own answers. Second,
verbatim copying of answers is not meaningful. Because the random-number
generator seeds are fixed, we can also generate the answer keys and match
students’ answers to those keys. I will present a system that
automatically manages all the students grading tasks with the Stata package
**aisa**. Finally, I will discuss applications in the classroom and
students’ reactions to the system.

**Additional information**

dc09_kolenikov.pdf

dc09_kolenikov_examples.zip

dc09_kolenikov.pdf

dc09_kolenikov_examples.zip

Martin Weiss

University of Tuebingen (Germany)

I have researched the economics of interactions on Statalist, based on the
full population of exchanges from 1 January to 30 April 2009. I will examine
both the “demand side” (the questions asked on the list) and the
“supply side” (the answers provided). I pay particular attention
to the role of unsatisfied demand (“orphans”), i.e., questions
that never attract a reply.

**Additional information**

dc09_weiss.pdf

dc09_weiss.pdf

Sergiy Radyakin

The World Bank

Stata provides a fairly extensive set of graphs. However, sometimes users
need to implement custom graphs, which are not yet available. In some cases,
it is possible to “tweak” a standard graph so that it results in
the desired image; in other cases, it is not possible. Stata uses a complex
system of objects implemented as classes and heavily relies on inheritance,
polymorphism, and overriding to implement its graphics. While standard class
programming is well described in the Stata manuals, the particulars of the
design and implementation of the Stata graphics features are not documented
by developers and thus are not easily accessible. In this presentation, I
will briefly discuss the overall idea of how Stata graphics works and
review some examples of custom graphics commands and their implementations.
This part of the discussion will be most useful for skilled Stata programmers
who want to know what is happening “under the hood” and,
perhaps, optimize their graphic commands to improve performance or add
features. Then we will look at the new command **matrixplot**, the sample
images rendered by which generated quite a lot of interest on Statalist.
**matrixplot** can be used to produce contour plots and heatmap-like plots,
and is particularly useful when working with climate data as well as when
displaying raster images for digital image processing.

**Additional information**

dc09_radyakin.pdf

dc09_radyakin.zip

dc09_radyakin.pdf

dc09_radyakin.zip