» Home » Resources & support » User Group meetings » 2008 Summer North American Stata Users Group meeting

Maarten Buis

Department of Social Research Methodology, Vrije Universiteit Amsterdam

Many of us, at some point, have received a comment from a member of the audience, a
reviewer, or an advisor who thinks the technique used is bad/biased/evil and
who knows of some new fancy method that solves the problem. In those cases,
you often want to know two things: 1) How big is the problem? and 2) Does that new
fancy method actually work? In this talk, I will demonstrate how to
answer these questions using the **simulate** command in Stata. I will illustrate
using the following two examples: First, say we have a dependent
variable that is collected not as a continuous variable but as a series of
ranges, e.g., wage measured in categories ($0–5/hour, $6–10/hour,
etc.). How bad is it to assign each category its middle value and treat it
as a continuous variable? How much better is **intreg** at dealing with
this problem? Second, various approaches are proposed if we have missing
data. The default in Stata (and most other packages) is to ignore all
observations with missing data. Official Stata also contains the
**impute** command, and there is the user-written **ice** command by
Patrick Royston. This raises the question of which method is the best.

**Additional information**

buis_MLBsimulate.zip

buis_MLBsimulate.zip

Austin Nichols

Urban Institute

I will present a brief introduction to fitting generalized
method-of-moments models in Stata, using the **optimize()** function in
Mata, with applications to nonlinear instrumental-variables models.

**Additional information**

nichols_gmm.pdf

nichols_gmm.pdf

Hau Chyi

WISE, Xiamen University, China

Orgul Ozturk

Moore School of Business, University of South Carolina

This research examines the effects of mothers’ welfare and work
decisions on their children’s attainments by using two types of
estimation methods in Stata: 1) an instrumental-variables (IV) approach and
2) a nonlinear simultaneous-equation estimation. The estimator employs
sibling comparisons in a random-effects framework and an IV approach to
address the unobserved heterogeneity that may influence mothers’ work
and welfare decisions. We use the popular Stata command **ivreg2** to
estimate the coefficients. Because production function of a child’s
ability can be written as a nonlinear function in a mother’s
decisions, we can also use the **nlsur** command to simultaneously
estimate the production function as well as the (first-stage) IV
projections. We focus on children who were born to single mothers with 12
or fewer years of schooling. IV in this study are welfare use during
childhood and a mother’s expected years of work. The identification
comes from the variation in mothers’ different economic incentives
that arises from the AFDC benefit structures across the United States. The
estimates imply that, relative to no welfare participation, participating
in welfare for one to three years provides up to a 5-percentage-point gain
in a child’s Picture Individual Achievement Test (PIAT) scores. The
negative effect of childhood welfare participation on adult earnings found
by others is not significant if one accounts for mothers’ work
decisions. At the estimated values of the model parameters, a
mother’s number of years of work contributes between $3,000 and
$7,000 1996 dollars to her child’s labor income but has no
significant effect on the child’s PIAT test scores. Finally, the
number of years of schooling for the children is relatively unresponsive to
their mother’s work and welfare participation choices.

**Additional information**

chyi_est_afdc_short.pdf

chyi_est_afdc_short.pdf

Ben Dwamena

University of Michigan Radiology and VA Nuclear Medicine Service, Ann Arbor, Michigan

I have previously demonstrated Stata implementation of bivariate
random-effects meta-analysis of the sensitivity and specificity of a single
binary diagnostic test by means of the midas module (Dwamena NASUG 2007;
Dwamena WCSUG 2007). In this presentation, I extend the work to
paired-comparison studies of two binary diagnostic tests. Using a dataset of
studies comparing the accuracy of positron emission tomography (PET) and
x-ray computed tomography (CT) for staging lung cancer, I compare the
fit (deviance) and complexity (BIC, AIC) and test performance estimates
(sensitivity, specificity, diagnostic odds ratios, and likelihood ratios) of
four multivariate models: 1) bivariate binomial mixed models with test type as
fixed-effect covariate; 2) bivariate binomial mixed models with test type as
random-effect covariate; 3) independent test-specific bivariate binomial
mixed models; and 4) correlated test-specific bivariate binomial mixed
models. I perform estimation with the Stata-native procedure **xtmelogit**
using both the default adaptive quadrature method and its Laplacian
approximation (nip=1). I then compare results with those from the
user-written **gllamm** command (written by Sophia Rabe-Hesketh, Andrew Pickles,
and Anders Skrondal).

**Additional information**

dwamena_snasug2008.pdf

dwamena_snasug2008.pdf

Carl Nelson

Agricultural and Consumer Economics, University of Illinois at Urbana–Champaign

The quaids ado-files written by Brian Poi provide a good template for
constructing alternative ado-files for maximum likelihood estimation of
demand systems. I describe how I used the template to construct ado-files to
estimate a five-commodity almost-ideal demand system with demographic
scaling. The system is applied to USDA national food consumption survey
data. The estimation is used as an exercise in a PhD-level microtheory
course that aims to connect the empirical implications of theory with
econometric estimation. I report on how maximum likelihood estimation of
demand systems contributes to student learning of both consumer theory and
nonlinear estimation. I include a discussion of how Mata is used to recover
coefficients from maximum likelihood estimation to perform postestimation
processing like calculation of elasticities.

**Additional information**

nelson_snasug08.pdf

nelson_snasug08.pdf

Paul Rathouz

Department of Health Studies, University of Chicago

I propose a new class of generalized linear models. As with the existing
models, these new models are specified via a linear predictor and a link
function for the mean of response Y as a function of predictors X. However,
here, the “baseline” distribution of Y when the linear predictor is zero is
left unspecified and is estimated from the data. The response distribution when
the linear predictor differs from zero is then generated via exponential
tilting of the baseline distribution, yielding a response model that is a
member of the natural exponential family, with corresponding canonical link
and variance functions. The resulting model has a similar level of
flexibility as the proportional odds model. Maximum likelihood estimators
are developed for response distribution with finite support, and the new
model is studied and illustrated through simulations and example analyses
from aging and psychiatry research.

**Additional information**

rathouz_sug_2008.pdf

rathouz_sug_2008.pdf

Tom Mustillo

Assistant Professor of Political Science, Indiana University–Purdue University Indianapolis

Sarah Mustillo

Associate Professor of Sociology, Purdue University

Stata can be used as a companion to relational database programs to compute
and serve up statistical and nonstandard functions for public use.
This session builds upon previous North America Stata Users Group meetings
on “Translating Data between MySQL and Stata” (2004),
“Working with ODBC Data Sources in Stata” (2004), and
“Integrating Stata with Database Management Systems” (2005) by demonstrating
how a Microsoft Access database of electoral data can call Stata do-files
to compute and/or estimate alternative measures of political party
nationalization. This database uses Stata to compute Jones and Mainwaring’s
(2003) measure of “Party Nationalization” using the **egen_inequal**
command and Morgenstern and Potthoff’s (2005) measure of the
“Components of Elections” using **xtmixed**. More generally, where
data reside live and for broad public consumption, Stata can play a valuable
role operating behind the scenes for nontechnical users where measures of
conceptual value cannot be generated from within the database environment.

**Additional information**

mustillo_nasug2008.ppt

mustillo_nasug2008.ppt

Sergiy Radyakin

The World Bank

The new command **USESPSS** allows users to open and process SPSS system
files in Stata for Windows. **USESPSS** is a “true reader” in
that it is completely independent from any specialized conversion
software, like Stat/Transfer, and it does not require SPSS
to be installed. **USESPSS** converts data files on the fly, preserving
variable labels, value labels, and missing values. Similarly to other conversion
software, **USESPSS** optimizes data storage types by looking for the most
efficient way to store SPSS data in Stata’s memory. **USESPSS** is
implemented as a plugin and works in a Windows 32-bit environment (however, it
understands SPSS files originating from both Windows and Unix platforms,
compressed and not compressed). The critical portions of its code are
written in assembly language; thus, SPSS data can be used in Stata programs
without a significant loss of performance. In part, the talk will also include
the process of developing plugins for Stata.

**Additional information**

radyakin_usespss.ppt

radyakin_usespss.ppt

P. Wilner Jeanty

The Ohio State University

The World Bank’s World Development Indicators (WDI) compilation is a rich
and widely used dataset about development of most economies in the world.
However, after obtaining the data from the World Bank’s website or the
WDI CD-ROM, users need to manage or reorganize the data in a certain way for
statistical applications. The World Bank has made great strides in rendering
WDI in several forms for download. Yet, seemingly unrelated regression
analysis, for example, cannot be performed using any of such structures.
Reorganizing the data for seemingly unrelated regression analysis as well as
renaming the series with meaningful variable names and maintaining the
series descriptors as variable labels in the reshaped dataset represent
significant data-management challenges for the inexperienced Stata user. I
will present a new Stata program, **wdireshape**, that reduces data-management
time and effort to zero when the ultimate structure is to fit panel-data and
seemingly unrelated regression models, or to have a dataset with the
countries as rows and the variables for each year as columns.

**Additional information**

jeanty_nasug08.zip

jeanty_nasug08.zip

David Drukker

StataCorp

In this talk, I will review dynamic panel-data analysis and how to perform
it using Stata. I also cover static models with predetermined variables.
For each model discussed, I review the econometrics and
then show how to perform the estimation using Stata.

**Additional information**

drukker_xtdpd.pdf

drukker_xtdpd.pdf

Alan Feiveson

NASA Johnson Space Center

A typical formulation for a linear mixed model is Y = X(be) + Z(u), where
(be) is a vector of “fixed” parameters, (u) is a vector of “random
effects”, and X and Z are matrices whose columns consist of design
variables and/or covariates. In some applications, the elements of Z may
depend on the unknown fixed parameters (be) as well as known covariates. A
common example is when an error variance is proportional to some power of
E(Y), the mean of Y. In particular, if the variance is proportional to the
square of E(Y), we have a constant-CV model. In this talk, I will give examples
of such models, including those with hierarchical structures, and show how
**xtmixed** can be used to estimate them and do proper inference on the
estimated parameters. I will compare the results with Bayesian estimation
under WINBUGS.

**Additional information**

feiveson_snasug_2008.ppt

feiveson_snasug_2008.ppt

Joseph Coveney

Cobridge Co., Ltd.

Users of **logit** or **logistic** occasionally encounter instances in
which one or more predictors perfectly predict one or both outcomes (a
condition called separation), or in which some outcomes are completely
determined (quasi-complete separation). Finite maximum likelihood estimates
do not exist under conditions of separation. Exact logistic regression with
**exlogistic** can serve as an alternative in these circumstances but is
sometimes infeasible. In the 1990s, David Firth proposed a type of
penalization for reducing bias of maximum likelihood estimates in
generalized linear models by means of modifying the score equations.
Firth’s method has the interpretation of penalized maximum likelihood
when the canonical link function is used, such as in logistic regression.
In this decade, Georg Heinze and colleagues have explored this technique as
a solution to the problem of separation in logistic regression. I describe a Stata
implementation, **firthlogit**, which maximizes the penalized log-likelihood
using **ml**. I illustrate its use in model fitting and predictions, inference
with penalized likelihood-ratio tests, and construction of profile
penalized likelihood confidence intervals. I use examples
where **logit** and **logistic** balk or do not give finite
maximum likelihood estimates, and where exact logistic regression is
problematic because of memory requirements or degenerate conditional
distributions.

**Additional information**

coveney_snasug08.pps

coveney_snasug08.pps

Partha Deb

Hunter College and the Graduate Center, CUNY

Finite mixture models provide a natural way of modeling continuous or
discrete outcomes that are observed from populations consisting of a finite
number of homogeneous subpopulations. Applications of finite mixture models
are abundant in the social and behavioral sciences, biological and
environmental sciences, engineering, and finance. Such models have a natural
representation of heterogeneity in a finite, usually small, number of latent
classes, each of which may be regarded as a type. More generally, the finite
mixture model can be shown to approximate any unknown distribution under
suitable regularity conditions. The Stata package **fmm** implements a maximum
likelihood estimator for a class of finite mixture models. In this talk, I
will begin by introducing finite mixture models with a number of examples,
and then I will discuss issues of estimation, testing, and model selection. I will then
describe estimation using **fmm**, calculations of predictions, marginal
effects, and posterior class probabilities, and I will illustrate these by using
examples from econometrics and finance.

**Additional information**

deb_fmm_slides.pdf

deb_fmm_slides.pdf

Jeffrey Wooldridge

Department of Economics, Michigan State University

Roberto G. Gutierrez

StataCorp

Stata’s approach to the analysis of data from complex surveys is
unique in that it clearly separates the declaration of the design aspects
of the survey (accomplished by **svyset**) from the actual analysis. Such
an arrangement is ideal because the design characteristics of the data do
not change according to the analysis being performed. Whether you are
constructing contingency tables or performing Cox regression, the sampling
weights and primary sampling units (not to mention the other design
specifications) remain constant. Stata’s treatment of survey data makes
it easy to maintain that consistency. Most of Stata’s model fitting and
other analysis commands can be applied easily to survey data, including (with
the release of Stata 10) commands for Cox regression and parametric models
for survival data in a survey setting. This talk is a tutorial on how to
make full use of Stata’s capabilities for survey data. Alternative variance
estimation is a key component of performing valid inference in light of
complex-survey designs, and I will discuss several variance-estimation
options. That discussion will include modern computationally intensive
methods such as balanced and repeated replication, the jackknife, and the
bootstrap, which are made feasible with the advent of better computer
technology. For these three methods, variance estimation can be done
directly or by using a series of replication weights.

**Additional information**

gutierrez_survey.pdf

gutierrez_survey.pdf

Stas Kolenikov

Department of Statistics, University of Missouri–Columbia

In this presentation, I will review the bootstrap for complex surveys with
designs featuring stratification, clustering, and unequal probability
weights. I will present the Stata module **bsweights**, which creates the
bootstrap weights for designs specified through and supported by **svy**.
I will also provide simple demonstrations highlighting the use of the
procedure and its syntax. I will discuss various tuning parameters and
their impact on the performance of the procedure, and I will give arguments
for the bootstrap by the method of weights in nonsurvey settings.

**Additional information**

kolenikov_snasug08.pdf

kolenikov_bsw-example.do

kolenikov_snasug08.pdf

kolenikov_bsw-example.do

David Drukker

StataCorp

In this talk, I will provide a quick introduction to estimators for the
parameters of spatial-autoregressive models and a quick introduction to a
suite of user-written Stata commands for managing spatial data and parameter
estimation.

**Additional information**

drukker_spatial.pdf

drukker_spatial.pdf