Home  /  Resources & support  /  User Group meetings  /  2008 Summer North American Stata Users Group meeting

Last updated: 15 January 2009

2008 Summer North American Stata Users Group meeting

24–25 July 2008


Gleacher Center, University of Chicago
450 North Cityfront Plaza Drive
Chicago, IL 60611


Understanding statistics using simulate

Maarten Buis
Department of Social Research Methodology, Vrije Universiteit Amsterdam
Many of us, at some point, have received a comment from a member of the audience, a reviewer, or an advisor who thinks the technique used is bad/biased/evil and who knows of some new fancy method that solves the problem. In those cases, you often want to know two things: 1) How big is the problem? and 2) Does that new fancy method actually work? In this talk, I will demonstrate how to answer these questions using the simulate command in Stata. I will illustrate using the following two examples: First, say we have a dependent variable that is collected not as a continuous variable but as a series of ranges, e.g., wage measured in categories ($0–5/hour, $6–10/hour, etc.). How bad is it to assign each category its middle value and treat it as a continuous variable? How much better is intreg at dealing with this problem? Second, various approaches are proposed if we have missing data. The default in Stata (and most other packages) is to ignore all observations with missing data. Official Stata also contains the impute command, and there is the user-written ice command by Patrick Royston. This raises the question of which method is the best.

Additional information

GMM estimation in Mata

Austin Nichols
Urban Institute
I will present a brief introduction to fitting generalized method-of-moments models in Stata, using the optimize() function in Mata, with applications to nonlinear instrumental-variables models.

Additional information

The effects of single mothers’ welfare participation and work decisions on children’s attainments

Hau Chyi
WISE, Xiamen University, China
Orgul Ozturk
Moore School of Business, University of South Carolina
This research examines the effects of mothers’ welfare and work decisions on their children’s attainments by using two types of estimation methods in Stata: 1) an instrumental-variables (IV) approach and 2) a nonlinear simultaneous-equation estimation. The estimator employs sibling comparisons in a random-effects framework and an IV approach to address the unobserved heterogeneity that may influence mothers’ work and welfare decisions. We use the popular Stata command ivreg2 to estimate the coefficients. Because production function of a child’s ability can be written as a nonlinear function in a mother’s decisions, we can also use the nlsur command to simultaneously estimate the production function as well as the (first-stage) IV projections. We focus on children who were born to single mothers with 12 or fewer years of schooling. IV in this study are welfare use during childhood and a mother’s expected years of work. The identification comes from the variation in mothers’ different economic incentives that arises from the AFDC benefit structures across the United States. The estimates imply that, relative to no welfare participation, participating in welfare for one to three years provides up to a 5-percentage-point gain in a child’s Picture Individual Achievement Test (PIAT) scores. The negative effect of childhood welfare participation on adult earnings found by others is not significant if one accounts for mothers’ work decisions. At the estimated values of the model parameters, a mother’s number of years of work contributes between $3,000 and $7,000 1996 dollars to her child’s labor income but has no significant effect on the child’s PIAT test scores. Finally, the number of years of schooling for the children is relatively unresponsive to their mother’s work and welfare participation choices.

Additional information

Multivariate mixed models for meta-analysis of paired-comparison studies of two medical diagnostic tests

Ben Dwamena
University of Michigan Radiology and VA Nuclear Medicine Service, Ann Arbor, Michigan
I have previously demonstrated Stata implementation of bivariate random-effects meta-analysis of the sensitivity and specificity of a single binary diagnostic test by means of the midas module (Dwamena NASUG 2007; Dwamena WCSUG 2007). In this presentation, I extend the work to paired-comparison studies of two binary diagnostic tests. Using a dataset of studies comparing the accuracy of positron emission tomography (PET) and x-ray computed tomography (CT) for staging lung cancer, I compare the fit (deviance) and complexity (BIC, AIC) and test performance estimates (sensitivity, specificity, diagnostic odds ratios, and likelihood ratios) of four multivariate models: 1) bivariate binomial mixed models with test type as fixed-effect covariate; 2) bivariate binomial mixed models with test type as random-effect covariate; 3) independent test-specific bivariate binomial mixed models; and 4) correlated test-specific bivariate binomial mixed models. I perform estimation with the Stata-native procedure xtmelogit using both the default adaptive quadrature method and its Laplacian approximation (nip=1). I then compare results with those from the user-written gllamm command (written by Sophia Rabe-Hesketh, Andrew Pickles, and Anders Skrondal).

Additional information

Teaching consumer theory with maximum likelihood estimation of demand systems

Carl Nelson
Agricultural and Consumer Economics, University of Illinois at Urbana–Champaign
The quaids ado-files written by Brian Poi provide a good template for constructing alternative ado-files for maximum likelihood estimation of demand systems. I describe how I used the template to construct ado-files to estimate a five-commodity almost-ideal demand system with demographic scaling. The system is applied to USDA national food consumption survey data. The estimation is used as an exercise in a PhD-level microtheory course that aims to connect the empirical implications of theory with econometric estimation. I report on how maximum likelihood estimation of demand systems contributes to student learning of both consumer theory and nonlinear estimation. I include a discussion of how Mata is used to recover coefficients from maximum likelihood estimation to perform postestimation processing like calculation of elasticities.

Additional information

Semiparametric generalized linear models

Paul Rathouz
Department of Health Studies, University of Chicago
I propose a new class of generalized linear models. As with the existing models, these new models are specified via a linear predictor and a link function for the mean of response Y as a function of predictors X. However, here, the “baseline” distribution of Y when the linear predictor is zero is left unspecified and is estimated from the data. The response distribution when the linear predictor differs from zero is then generated via exponential tilting of the baseline distribution, yielding a response model that is a member of the natural exponential family, with corresponding canonical link and variance functions. The resulting model has a similar level of flexibility as the proportional odds model. Maximum likelihood estimators are developed for response distribution with finite support, and the new model is studied and illustrated through simulations and example analyses from aging and psychiatry research.

Additional information

Using Stata as a computational tool in a relational database environment

Tom Mustillo
Assistant Professor of Political Science, Indiana University–Purdue University Indianapolis
Sarah Mustillo
Associate Professor of Sociology, Purdue University
Stata can be used as a companion to relational database programs to compute and serve up statistical and nonstandard functions for public use. This session builds upon previous North America Stata Users Group meetings on “Translating Data between MySQL and Stata” (2004), “Working with ODBC Data Sources in Stata” (2004), and “Integrating Stata with Database Management Systems” (2005) by demonstrating how a Microsoft Access database of electoral data can call Stata do-files to compute and/or estimate alternative measures of political party nationalization. This database uses Stata to compute Jones and Mainwaring’s (2003) measure of “Party Nationalization” using the egen_inequal command and Morgenstern and Potthoff’s (2005) measure of the “Components of Elections” using xtmixed. More generally, where data reside live and for broad public consumption, Stata can play a valuable role operating behind the scenes for nontechnical users where measures of conceptual value cannot be generated from within the database environment.

Additional information

USESPSS: Processing SPSS files in Stata

Sergiy Radyakin
The World Bank
The new command USESPSS allows users to open and process SPSS system files in Stata for Windows. USESPSS is a “true reader” in that it is completely independent from any specialized conversion software, like Stat/Transfer, and it does not require SPSS to be installed. USESPSS converts data files on the fly, preserving variable labels, value labels, and missing values. Similarly to other conversion software, USESPSS optimizes data storage types by looking for the most efficient way to store SPSS data in Stata’s memory. USESPSS is implemented as a plugin and works in a Windows 32-bit environment (however, it understands SPSS files originating from both Windows and Unix platforms, compressed and not compressed). The critical portions of its code are written in assembly language; thus, SPSS data can be used in Stata programs without a significant loss of performance. In part, the talk will also include the process of developing plugins for Stata.

Additional information

Reshaping the World Development Indicators (WDI) for panel data and seemingly unrelated regression modeling in Stata

P. Wilner Jeanty
The Ohio State University
The World Bank’s World Development Indicators (WDI) compilation is a rich and widely used dataset about development of most economies in the world. However, after obtaining the data from the World Bank’s website or the WDI CD-ROM, users need to manage or reorganize the data in a certain way for statistical applications. The World Bank has made great strides in rendering WDI in several forms for download. Yet, seemingly unrelated regression analysis, for example, cannot be performed using any of such structures. Reorganizing the data for seemingly unrelated regression analysis as well as renaming the series with meaningful variable names and maintaining the series descriptors as variable labels in the reshaped dataset represent significant data-management challenges for the inexperienced Stata user. I will present a new Stata program, wdireshape, that reduces data-management time and effort to zero when the ultimate structure is to fit panel-data and seemingly unrelated regression models, or to have a dataset with the countries as rows and the variables for each year as columns.

Additional information

Estimating the parameters of dynamic panel-data models using Stata

David Drukker
In this talk, I will review dynamic panel-data analysis and how to perform it using Stata. I also cover static models with predetermined variables. For each model discussed, I review the econometrics and then show how to perform the estimation using Stata.

Additional information

Estimation of constant-CV regression models

Alan Feiveson
NASA Johnson Space Center
A typical formulation for a linear mixed model is Y = X(be) + Z(u), where (be) is a vector of “fixed” parameters, (u) is a vector of “random effects”, and X and Z are matrices whose columns consist of design variables and/or covariates. In some applications, the elements of Z may depend on the unknown fixed parameters (be) as well as known covariates. A common example is when an error variance is proportional to some power of E(Y), the mean of Y. In particular, if the variance is proportional to the square of E(Y), we have a constant-CV model. In this talk, I will give examples of such models, including those with hierarchical structures, and show how xtmixed can be used to estimate them and do proper inference on the estimated parameters. I will compare the results with Bayesian estimation under WINBUGS.

Additional information

Logistic regression by means of penalized maximum likelihood estimation in cases of separation

Joseph Coveney
Cobridge Co., Ltd.
Users of logit or logistic occasionally encounter instances in which one or more predictors perfectly predict one or both outcomes (a condition called separation), or in which some outcomes are completely determined (quasi-complete separation). Finite maximum likelihood estimates do not exist under conditions of separation. Exact logistic regression with exlogistic can serve as an alternative in these circumstances but is sometimes infeasible. In the 1990s, David Firth proposed a type of penalization for reducing bias of maximum likelihood estimates in generalized linear models by means of modifying the score equations. Firth’s method has the interpretation of penalized maximum likelihood when the canonical link function is used, such as in logistic regression. In this decade, Georg Heinze and colleagues have explored this technique as a solution to the problem of separation in logistic regression. I describe a Stata implementation, firthlogit, which maximizes the penalized log-likelihood using ml. I illustrate its use in model fitting and predictions, inference with penalized likelihood-ratio tests, and construction of profile penalized likelihood confidence intervals. I use examples where logit and logistic balk or do not give finite maximum likelihood estimates, and where exact logistic regression is problematic because of memory requirements or degenerate conditional distributions.

Additional information

Finite mixture models

Partha Deb
Hunter College and the Graduate Center, CUNY
Finite mixture models provide a natural way of modeling continuous or discrete outcomes that are observed from populations consisting of a finite number of homogeneous subpopulations. Applications of finite mixture models are abundant in the social and behavioral sciences, biological and environmental sciences, engineering, and finance. Such models have a natural representation of heterogeneity in a finite, usually small, number of latent classes, each of which may be regarded as a type. More generally, the finite mixture model can be shown to approximate any unknown distribution under suitable regularity conditions. The Stata package fmm implements a maximum likelihood estimator for a class of finite mixture models. In this talk, I will begin by introducing finite mixture models with a number of examples, and then I will discuss issues of estimation, testing, and model selection. I will then describe estimation using fmm, calculations of predictions, marginal effects, and posterior class probabilities, and I will illustrate these by using examples from econometrics and finance.

Additional information

Inference for partial effects in nonlinear panel-data models using Stata

Jeffrey Wooldridge
Department of Economics, Michigan State University
Abstract not available.

Additional information

Analyzing survey data using Stata 10

Roberto G. Gutierrez
Stata’s approach to the analysis of data from complex surveys is unique in that it clearly separates the declaration of the design aspects of the survey (accomplished by svyset) from the actual analysis. Such an arrangement is ideal because the design characteristics of the data do not change according to the analysis being performed. Whether you are constructing contingency tables or performing Cox regression, the sampling weights and primary sampling units (not to mention the other design specifications) remain constant. Stata’s treatment of survey data makes it easy to maintain that consistency. Most of Stata’s model fitting and other analysis commands can be applied easily to survey data, including (with the release of Stata 10) commands for Cox regression and parametric models for survival data in a survey setting. This talk is a tutorial on how to make full use of Stata’s capabilities for survey data. Alternative variance estimation is a key component of performing valid inference in light of complex-survey designs, and I will discuss several variance-estimation options. That discussion will include modern computationally intensive methods such as balanced and repeated replication, the jackknife, and the bootstrap, which are made feasible with the advent of better computer technology. For these three methods, variance estimation can be done directly or by using a series of replication weights.

Additional information

Survey bootstrap and bootstrap weights

Stas Kolenikov
Department of Statistics, University of Missouri–Columbia
In this presentation, I will review the bootstrap for complex surveys with designs featuring stratification, clustering, and unequal probability weights. I will present the Stata module bsweights, which creates the bootstrap weights for designs specified through and supported by svy. I will also provide simple demonstrations highlighting the use of the procedure and its syntax. I will discuss various tuning parameters and their impact on the performance of the procedure, and I will give arguments for the bootstrap by the method of weights in nonsurvey settings.

Additional information

Analyzing spatial autoregressive models in Stata

David Drukker
In this talk, I will provide a quick introduction to estimators for the parameters of spatial-autoregressive models and a quick introduction to a suite of user-written Stata commands for managing spatial data and parameter estimation.

Additional information

Scientific organizers

Phil Schumm, (chair), University of Chicago

Scott Long, Indiana University

Pravin Trivedi, Indiana University

Richard Williams, University of Notre Dame

Logistics organizers

Chris Farrar, StataCorp

Gretchen Farrar, StataCorp