Last updated: 16 August 2011
 2011 Stata Conference Chicago 
 14–15 July 2011 
  
  Gleacher Center
  The University of Chicago Booth School of Business
  450 North Cityfront Plaza Drive
  Chicago, IL 60611
            
        
                
Proceedings
                               
 Tricks with Hicks: Stata gmm code for nonlinear GMM 
Carl Nelson
University of Illinois–Urbana–Champaign
  In a June, 2009 
American Economic Review article entitled
  “Tricks with Hicks: The EASI demand system”, Arthur Lewbel and
  Krishna Pendakur proposed the exact affine Stone index demand system. This
  system allows Engel curve behavior higher than rank 3, demographics, and
  unobserved heterogeneity in tastes. The 
American Economic Review web supplement for the article
  provides Stata code to estimate linear and iterative linear versions of
  the model. But the full nonlinear system instrumental variable estimates
  were obtained with TSP econometric software using command 
frml to obtain
  nonlinear three-stage least-squares estimates.  I present Stata code to estimate the nonlinear
  exact affine Stone index demand system using the Stata 
gmm command. This is an example of the
  important estimation extensions that have been made possible by the
  introduction of the 
gmm command.
  
   
Additional information
   chi11_nelson.pdf
   engel.png
   lewbelpendakur09_20.pdf
 xtmixed and denominator degrees of freedom: Myth or magic
Phil Ender
UCLA Statistical Consulting Group
  I review issues and controversy surrounding 
F-ratio denominator degrees
  of freedom in linear mixed models. I will look at the
  history of denominator degrees of freedom and survey their use in
  various statistical packages.
  
   
Additional information
   chi11_ender.pdf
Using the margins command to estimate and interpret adjusted predictions
and marginal effects 
Richard Williams
University of Notre Dame
  As Long and Freese show, it can often be helpful to compute
  predicted and expected values for hypothetical or prototypical cases. Stata 11
  introduced new tools—factor variables and the 
margins
  command—for making such calculations. These can do many of the things
  that were previously done by Stata’s own 
adjust and 
mfx
  commands, as well as Long and Freese’s 
spost9 commands like
  
prvalue.  Unfortunately, the complexity of the 
margins syntax, the
  daunting 50-page reference manual entry that describes it, and a lack of
  understanding about what 
margins offers over older commands may have
  dissuaded researchers from using it.  This paper therefore shows how
  
margins can easily replicate analyses done by older commands. It
  demonstrates how 
margins provides a superior means for dealing with
  interdependent variables (for example, 
X and 
X2; 
X1,
  
X2, and 
X1 × 
X2; multiple dummies created from a
  single categorical variable), and is also superior for data that are
  
svyset. The paper explains how the new 
asobserved option works
  and the substantive reasons for preferring it over the 
atmeans
  approach used by older commands.  The paper primarily focuses on the
  computation of adjusted predictions, but also shows how 
margins has
  the same advantages for computing marginal effects.
  
   
Additional information
   chi11_williams.pptx
Using margins to test for group differences in growth
trajectories in generalized linear mixed models
Sarah Mustillo (with L.R. Landerman and K.C. Land)
Purdue University, Duke University School of Medicine, and Duke University
  To test for group differences in growth trajectories in mixed (fixed and
  random-effects) models, researchers frequently interpret the coefficient of
  group-by-time product terms. While this practice is straightforward in
  linear mixed models, testing for group differences in generalized linear
  mixed models is more complex. Using both an empirical example and simulated
  data, we show that the coefficient of group-by-time product terms in mixed
  logistic and Poisson models estimate the multiplicative change with respect
  to the baseline rates, while researchers often are more interested in
  differences in the predicted rate of change between groups. The latter can
  be obtained by using the 
margins command in Stata. This may be
  especially desirable when the mean of the outcome variable is low and
  marginal change differs from multiplicative change.  We propose and
  illustrate the use of 
margins to interpret group differences in rates
  of change over time following estimation with generalized linear models.
  
   
Additional information
   chi11_mustillo.pptx
Graphics tips for all
Nicholas J. Cox
Durham University, United Kingdom
  Stata’s graphics were completely rewritten for Stata 8, with further
  key additions in later versions. Its official commands have, as usual, been
  supplemented by a variety of user-written programs. The resulting variety
  presents even experienced users with a system that undeniably is large,
  often appears complicated, and sometimes seems confusing. In this talk, I
  provide a personal digest of graphics strategy and tactics for Stata users;
  I emphasize details large and small that, in my view, deserve to be known by
  all.
  
   Additional information
   chi11_cox.zip
Stata as a data-entry management tool
Ryan Knight
Innovations for Poverty Action
  It is increasingly common for social scientists to be involved in primary
  data collection, whether through the administration of unique survey
  instruments or the execution of field experiments. Novel datasets present
  novel challenges for researchers, who may find themselves responsible for
  ensuring that any information collected is entered into the computer
  accurately. This presentation discusses why and how one might use Stata as a
  tool for data-entry management and introduces three new user-written
  commands that streamline the data-entry process.  The commands are:
  
cfout, which is an extension of the 
cf command that outputs a user-friendly
  list of all discrepancies between two datasets (for example, the first and second
  entry of a double-entered dataset); 
readreplace, which makes many
  replacements to a dataset, based on a corrected list of the discrepancies
  generated by 
cfout; and 
mergeall, which merges many files without
  loss of information due to string and numeric differences. This suite of
  commands can help reduce the cost and increase of the accuracy of primary
  data collection, and it extends Stata’s data-management capabilities to
  include the management of data entry.
  
   
Additional information
   chi11_knight.pptx
Universal and mass customization of tables in Stata
Roy Wada
University of Illinois–Chicago
  There is a strong demand for a systematic and uniform approach to
  table-making, yet it is currently believed that this is not plausible or
  is nonexistent in Stata. There is also an impression that tabulation tables
  are inherently different from summary tables or regression tables. This
  presentation shows that it is possible to design a programmatic, universal
  solution once the similarities between the apparently different types of
  tables are understood. The universal approach to table-making is implemented
  in the latest version of 
outreg2. Thus a mass customization of
  various types of tables, including cross-tabulations and stub-and-banner
  types of tables, can be readily produced in Stata.
  
   
Additional information
   chi11_wada.pptx
Fractional response models with endogenous explanatory variables and heterogeneity
 Jeffrey M. Wooldridge
 Michigan State University
  In this talk, I will discuss ways of using Stata to fit fractional
  response models when explanatory variables are not exogenous. Two questions
  are of primary concern: First, how does one account for endogenous
  explanatory variables, both continuous and discrete, when the response
  variable is fractional and may take values at the corners? Second, how can
  we incorporate unobserved heterogeneity in panel-data fractional models when
  the panel might be unbalanced? I will draw on Papke and Wooldridge (2008,
  
Journal of Econometrics 145: 121–133) and two unpublished
  papers of mine, “Quasi-maximum likelihood estimation and testing for
  nonlinear models with endogenous explanatory variables” and
  “Correlated random effects models with unbalanced panels”. One
  practically important conclusion is that by expanding the scope of existing
  Stata commands to allow fractional responses—in particular, the
  
ivprobit, 
biprobit, 
hetprob, and (user-written)
  
gllamm commands—flexible fractional response models can easily
  be fit.
  
   
Additional information
   chi11_wooldridge.pdf
Causal inference for binary regression with observational data
Austin Nichols
Urban Institute
  Special problems arise when trying to do causal inference for binary
  regression with observational data; we will examine some of these problems
  and critically examine several common and not-so-common solutions.
  
   Additional information
   chi11_nichols.pdf
Estimating the parameters of simultaneous-equations models with the sem command in Stata 12
David M. Drukker
StataCorp
  In this talk, I introduce Stata 12’s new 
sem command for
  estimating the parameters of
  simultaneous-equations models.  Some of the considered models
  include unobserved factors.  Estimation methods include maximum likelihood
  and the generalized method of moments.
  
   
Additional information
   chi11_drukker_sem.pdf
Calculating bronchiolitis severity using ordinal regression with a new function in Stata
Carl Mitchell (with Paul Walsh)
Kern County Medical Center Department of Emergency Medicine/UCLA
  A new command has been developed implementing a previously validated tool
  for describing bronchiolitis severity. Bronchiolitis is one of the most
  common causes of hospital admission for infants and it is widely studied.
  This command classifies predicted severity of illness using an ordinal
  regression model. Optionally, the user can obtain the predicted probability of
  hospital admission and the probability of an infant falling into a 
  severity of illness classification different than that predicted.
  
   Additional information
   chi11_mitchell.pdf
Teaching statistics with Stata in emergency medicine (EM) journal club
Muhammad Waseem
Lincoln Medical and Mental Health Center
  Residency training is an important period when a physician learns and
  acquires the necessary skills of searching for, evaluating, and applying medical
  knowledge. The journal club is an academic event and an important forum for
  this purpose. The objective of the journal club is to learn and develop a
  skill to find, appraise, and implement practice-changing advancements in the
  medical literature. We report our experience with Stata in journal club in
  teaching emergency medicine residents statistics in addition to critical appraisal
  skills. To understand and utilize the current literature effectively, an
  understanding of basic statistical methods is essential. We introduced Stata
  while discussing the methods and results section of an article in the
  journal club to teach application of some common statistical tests.
  Published studies were selected to illustrate and provide the insight of
  commonly used statistical concepts. We noted that improved understanding of
  statistics resulted in increased interest and enthusiasm of residents to
  participate in journal club. Integrating a statistical software program such
  as Stata into journal club can serve as an important tool to enhance learning.
  Further studies should be conducted to fully utilize these
  opportunities for enhanced learning of in-training physicians.
  
   Additional information
   chi11_waseem.pptx
Use of cure fraction models for the survival analysis of uterine cancer patients
Noori Akhtar-Danesh (with Alice Lytwyn and Laurie Elit)
McMaster University
  In population-based cancer studies, a cure fraction model
  classifies patients into those who survive the cancer and those who
  encounter excess mortality risk compared with the general population
  (2007, 
Stata Journal 7: 1–25).  In
  this presentation, we report the proportion cured and the relative survival
  pattern for patients diagnosed with uterine cancer in Canada over the period
  of 1992–2005. We used a nonmixture cure fraction model to estimate
  the cure fraction rate and the relative survival among “uncured”
  patients (2007, 
Stata Journal 7: 1–25). Then we predicted the cure fraction rate and median survival
  for each age group based on the year of diagnosis. Relative
  survival and cure fraction rate decreased with age but increased gradually
  over time. Relative survivals for Eastern Canada and Ontario were lower
  compared with the other regions.  The same applies to the comparison of
  cure fraction rates between the geographical regions. This is
  the first study using a cure fraction model for analysis of uterine cancer.
  Although there are some limitations attached to this model, it is flexible
  enough to be used with different parametric distributions and to include
  different link functions for relative survival analysis.
  
   
Additional information
   chi11_akhtar_danesh.ppt
Using Mata to import Illumina SNP chip data for genome-wide association studies 
Chuck Huber (with Michael Hallman, Victoria Friedel,
Melissa Richard, and Huandong Sun) 
Texas A&M Health Science Center School of Rural
Public Health and University of Texas School of Public Health 
  Modern genetic genome-wide association studies typically rely on
  single nucleotide polymorphism (SNP) chip technology to determine hundreds
  of thousands of genotypes for an individual sample. Once these genotypes are
  ascertained, each SNP (alone or in combination) is tested for association
  outcomes of interest such as disease status or severity.  Project Heartbeat!
  was a longitudinal study conducted in the 1990s that explored changes in
  lipids and hormones and morphological changes in children from age 8–18
  years. A genome-wide association study is currently being conducted to look
  for SNPs that are associated with these developmental changes. While there
  are specialty programs available for the analysis of hundreds of thousands
  of SNPs, they are not capable of modeling longitudinal data. Stata is
  well-equipped for modeling longitudinal data but cannot load hundreds of
  thousands of variables into memory simultaneously. This talk will briefly
  describe the use of Mata to import hundreds of thousands of SNPs from the
  Illumina SNP chip platform and how to load those data into Stata for
  longitudinal modeling.
  
   Additional information
   chi11_huber.pptx
Graphics tricks for models
Bill Rising
StataCorp
  Visualizing interactions and response surfaces can be difficult. In this
  talk, I will show how to do the former by graphing adjusted means and the
  latter by showing how to roll together contour plots. I will demonstrate
  this for both linear and nonlinear models.
  
   Additional information
   chi11_rising.pdf
   chi11_rising_files.zip
Malmquist productivity analysis using DEA frontier in Stata
Choonjoo Lee
Korea National Defense University
  In this presentation, the author presents a procedure and an illustrative
  application of a user-written Malmquist productivity analysis (MPA) using
  data envelopment analysis (DEA) frontier in Stata. MPA measures the
  productivity changes for units between time periods. MPA has been used
  widely for assessing the productivity changes of public and private sectors,
  such as banks, airlines, hospitals, universities, defense firms, and
  manufacturers, when the panel data are available. The MPA using DEA frontier
  in Stata will allow Stata users to conduct not only the stochastic approach
  for productivity analysis using stochastic-frontier analysis but also the nonstochastic
  approach using DEA frontier, also suggested by the author. The user-written
  MPA approach in Stata will provide some possible future extensions of Stata
  programming in productivity analysis.
  
   Additional information
   chi11_lee.ppt
   chi11_lee_files.zip
An interpretation and implementation of the Theil–Goldberger
“mixed” estimator 
Christopher Baum
Boston College and DIW Berlin
  In the early 1960s, Theil and Goldberger proposed a
  generalized least-squares approach to “mixing” sample
  information and prior beliefs about the coefficients of a regression
  equation. Their “mixed” estimator may be considered as a
  stochastic version of constrained least squares (Stata’s
  
cnsreg). Although based on frequentist statistics, the Theil–Goldberger estimator
  is identical to that used in a Bayesian estimation approach when an
  informative prior density is employed. It may also be
  viewed as a one-shot application of the Kalman filter,
  providing an updating equation for point and interval coefficients based on
  prior and sample information. I discuss the
  motivation for the estimator and my implementation in Stata code,
  
tgmixed, and give illustrations of how it might be usefully employed.
  
   
Additional information
   chi11_baum.pdf
Multilevel regression and poststratification in Stata
Maurizio Pisati (with Valeria Glorioso)
University of Milano–Bicocca and Harvard School of Public Health 
  Sometimes, social scientists are interested in determining whether, and to
  what extent, the distribution of a given variable of interest 
Y
  varies across the categories of a second variable 
D. When the number of
  valid observations within one or more categories of 
D is small or the
  collected data are affected by selection bias, relatively accurate estimates
  of 
E(
Y|
D) can be obtained by using a proper combination
  of multilevel regression modeling and poststratification, called the multilevel regression modeling and poststratification
  approach (Gelman and Little 1997, 
Survey Methodology 23: 127–135; Gelman and Bafumi 2004, 
Political Analysis 12: 375–385; and Lax and Phillips 2009, 
American Journal of Political Science 53: 107–121). The purpose of this talk is to illustrate the main features
  and applications of 
mrp, a new user-written program that implements
  the multilevel regression modeling and poststratification approach in Stata.
  
   
Additional information
   chi11_pisati.pdf
Mata, the missing manual
William W. Gould
StataCorp
  Mata is Stata’s matrix programming language. StataCorp provides
  detailed documentation on it, but so far has failed to give users—and
  especially users who add new features to Stata—any guidance in when
  and how to use the language. In this talk, I provide what has been missing.
  In practical ways, I show how to include Mata code in Stata ado-files,
  reveal when to include Mata code and when not to, and provide an
  introduction to the broad concepts of Mata—the concepts that will make the
  
Mata Reference Manual approachable.
  
   
Additional information
   chi11_gould.pdf
Stata Graph Library for network analysis
Hirotaka Miura
Federal Reserve Bank of San Francisco
  Network analysis is a multidisciplinary research method that is fast
  becoming a popular and exciting field of study. Though a number of
  statistical programs possess sophisticated packages for analyzing networks,
  similar capabilities have yet to be made available in Stata. In an effort to
  motivate the use of Stata for network analysis, I designed in Mata the Stata
Graph Library (SGL), which consists of algorithms that construct matrix
representations of networks, compute centrality measures, and calculate
clustering coefficients.  Performance tests conducted between C++ and SGL
implementations indicate gross inefficiencies in current SGL routines, making
SGL practically infeasible to be used for large networks. The obstacles are,
however, welcome challenges in the effort to spread the use of Stata as an
instrument for analyzing networks, and future developments will focus on
addressing computational time complexities as well as integrating additional
capabilities into SGL.
  
   Additional information
   chi11_miura.pdf
   chi11_miura_SGL_version_1.1.2.zip
Filtering and decomposing time series in Stata 12
David M. Drukker
StataCorp
  In this talk, I introduce new methods in Stata 12 for filtering and
  decomposing time series and I show how to implement them.  I
  provide an underlying framework for understanding and comparing the
  different methods.  I also present a framework for interpreting the
  parameters.
  
   Additional information
   chi11_drukker_filter.pdf
Scientific organizers
Phil Schumm, (chair) University of Chicago
Lisa Barrow, Federal Reserve Bank of Chicago
Scott Long, Indiana University
Rich Williams, University of Notre Dame
Logistics organizers
Chris Farrar, StataCorp
Gretchen Farrar, StataCorp