» Home » Stata Conferences and Users Group meetings » 2013 Stata Conference New Orleans

*Last updated: 31 July 2013*

Hyatt French Quarter New Orleans

800 Iberville Street

New Orleans, Louisiana

Max Löffler

IZA and University of Cologne

When one estimates discrete choice models, the mixed logit approach is
commonly superior to simple conditional logit setups. Mixed logit models not
only allow the researcher to implement difficult random components but also
overcome the restrictive IIA assumption. Despite these theoretical
advantages, the estimation of mixed logit models becomes cumbersome when the
model's complexity increases. Applied works therefore often rely on rather
simple empirical specifications because this reduces the computational
burden. I introduce the user-written command **lslogit**, which fits
complex mixed logit models using maximum simulated likelihood methods.
Because **lslogit** is a d2-ML-evaluator written in Mata, the estimation is
rather efficient compared with other routines. It allows the researcher to
specify complicated structures of unobserved heterogeneity and to choose
from a set of frequently used functional forms for the direct utility
function--for example, Box–Cox transformations, which are difficult to
estimate in the context of logit models. The particular focus of
**lslogit** is on the estimation of labor supply models in the discrete
choice context; therefore, it facilitates several computationally exhausting
but standard tasks in this research area. However, the command can be used
in many other applications of mixed logit models as well.

**Additional information**

nola13-loffler.pdf

nola13-loffler.pdf

Carlos Gradín

Universidade de Vigo and EQUALITAS

The purpose of this presentation is to introduce a new user-written code
that allows for measuring poverty in a panel of individuals. It complements
existing poverty codes for a cross-section of individuals (for example,
**povdeco**, poverty) by producing a new family of indices proposed by
Gradín, Cantó and Del Río (2012). This family of
indices is a natural extension of the popular
Foster–Greer–Thorbecke (FGT) poverty indices to the longitudinal
case in which individuals are observed for more than one period. It takes
into account that longer spells of poverty and more unequal profiles of
poverty aggravate poverty. These measures have attractive decomposability
properties. One particular advantage of this family of indices is that it
embraces other indices recently proposed in the literature as particular
cases.

Reference

Gradín, C., O. Cantó, and C. Del Río. 2012. Measuring poverty accounting for time.*Review of Income and Wealth* 58: 330–354.

**Additional information**

nola13-gradin.pptx

Reference

Gradín, C., O. Cantó, and C. Del Río. 2012. Measuring poverty accounting for time.

nola13-gradin.pptx

Soufiane Khoudmi

Benoit Mulkay

University of Montpellier

This presentation provides a Stata application for the estimation of Banks,
Blundell, and Lewbel's (1997) demand system dealing with the zero problem,
which is central to many expenditure survey analyses. We start from Poi's
(2008) routine, and our main contribution is the multivariate censoring
correction; we implement Tauchman's (2010) theoretical framework, which
relies on including correction terms in the system. These are computed from
a multivariate probit estimated with simulated maximum likelihood using
Cappellari and Jenkin's (2007) mvnp routine. We also discuss how to deal
with several econometric issues related to the demand system estimation
literature: total budget endogeneity, conditional linearity, and
symmetry restriction (using minimum distance estimator).

References

Banks, J., R. Blundell, and A. Lewbel. 1997. Quadratic Engel curves and consumer demand.*Review of Economics and Statistics*
79: 527–539.

Cappellari, L. and S. P. Jenkins. 2006. Calculation of multivariate normal probabilities by simulation, with applications to maximum simulated likelihood estimation.*Stata Journal* 6:
156–189.

Poi, B. 2008. Demand-system estimation: Update.*Stata Journal* 8: 554–556.

Tauchmann. H. 2010. Consistency of Heckman-type two-step estimators for the multivariate sample-selection model.*Applied Economics* 42: 3895–3902.

**Additional information**

nola13-khoudmi.pdf

References

Banks, J., R. Blundell, and A. Lewbel. 1997. Quadratic Engel curves and consumer demand.

Cappellari, L. and S. P. Jenkins. 2006. Calculation of multivariate normal probabilities by simulation, with applications to maximum simulated likelihood estimation.

Poi, B. 2008. Demand-system estimation: Update.

Tauchmann. H. 2010. Consistency of Heckman-type two-step estimators for the multivariate sample-selection model.

nola13-khoudmi.pdf

Christopher F. Baum

Boston College and DIW Berlin

Mark E. Schaffer

Heriot–Watt University

Testing for the presence of autocorrelation in a time series is a common
task for researchers working with time-series data. The standard Q test
statistic, introduced by Box and Pierce (1970) and refined by Ljung and Box
(1978), is applicable to univariate time series and to testing for residual
autocorrelation under the assumption of strict exogeneity. Breusch (1978)
and Godfrey (1978) in effect extended the L-B-P approach to testing for
autocorrelations in residuals in models with weakly exogenous regressors.
However, each of these readily available tests has important limitations.
We use the results of Cumby and Huizinga (1992) to extend the implementation
of the Q test statistic of L-B-P-B-G to cover a much wider range of
hypotheses and settings: (a) tests for the presence of autocorrelation of
order p through q, where under the null hypothesis, there may be
autocorrelation of order p-1 or less; (b) tests after estimation in
which regressors are endogenous and estimation is by IV or GMM methods; and
(c) tests after estimation using panel data. We show that the
Cumby–Huizinga test, although developed for the large-T setting, is
formally identical to the test developed by Arellano and Bond (1991) for
AR(2) in a large-N panel setting.

**Additional information**

nola13-baum.pdf

nola13-baum.pdf

Sylvia Beatriz Guillermo Peón

Benemérita Universidad Autónoma de Puebla

Martin Rodriguez Brindis

Universidad La Salle

This paper aims at analyzing the exchange rate pass-through mechanism for the
Mexican economy and is carried out using Stata under two time-series
frameworks. The first framework is a recursive structural VAR (SVAR) model,
which, unlike the traditional VAR model, allows us to impose additional
restrictions on the contemporaneous and lagged matrices of coefficients. The
second is a VEC approach, which considers the possibility of valid
cointegrating relationships and allows us to incorporate the deviations from
the long-run equilibrium (cointegrating equations) as explanatory variables
when modeling the short-run behavior of the variables. Both frameworks aim
at the estimation of impulse–response functions (IRFs) as a tool to
analyze the degree and timing of the effect of exchange rate changes on
domestic prices. The recursive SVAR approach allows us to estimate the
structural IRFs, while the VEC approach uses the Cholesky decomposition of
the white noise variance–covariance matrix by imposing some necessary
restrictions so that causal interpretation of the simple IRFs is possible.
If cointegration exists, estimation of the IRFs provides a tool to identify
when the effect of a shock to the exchange rate is transitory and when it is
permanent.

**Additional information**

nola13-guillermo.ppsx

nola13-guillermo.ppsx

Rose Anne Medeiros

Rice University

Stata's **sem** command includes the ability to estimate models with
missing data using full-information maximum likelihood estimation (FIML).
One of the assumptions of FIML is that the data are at least missing at
random (MAR); that is, conditional on other variables in the model,
missingness is not dependent on the value that would have been observed. The
MAR assumption can be made more plausible and estimation improved by the
inclusion of auxiliary variables, that is, variables that predict
missingness or are related to the variables with missing values but are not
part of the substantive model. The inclusion of auxiliary variables is
common in multiple imputation models but less common in models estimated
using FIML. This presentation will introduce users to the saturated
correlates model (Graham 2003), a method of including auxiliary variables in
FIML models. Examples demonstrating how to include auxiliary variables
using the saturated correlates model with Stata's **sem** command will be
shown.

**Additional information**

nola13-medeiros.pdf

nola13-medeiros.pdf

Rob Woodruff

Battelle Memorial Institute

The stereotype logistic regression model for a categorical dependent
variable is often described as a compromise between the multinomial and
proportional-odds logistic models and has many attractive features. Among
these are the ability to test the adequacy of the model fit compared with
the unconstrained multinomial model, to test the distinguishability of the
outcome categories, and even to test the "ordinality" assumption itself.
What brought me to write the new command, however, was the desire to take
advantage of these capabilities while working on a matched,
case–control study. Like the multinomial logistic model (and unlike
the proportional-odds model), the stereotype model yields valid inference
under outcome-dependent sampling designs and can be much more parsimonious.
The working title of my command is **cstereo**, and it is implemented
using the d2-method of Stata's **ml** command. In terms of existing Stata
capabilities, **clogit** is to **logit** as **cstereo** is to
**slogit**. In this presentation, I will demonstrate the command's
features using a simulated matched, case–control dataset.

**Additional information**

nola13-woodruff.pptx

nola13-woodruff.pptx

Joerg Luedicke

Yale University and University of Florida

A widespread tool in the context of a point null hypothesis significance
testing framework is the computation of statistical power, especially in the
planning stage of quantitative studies. However, asymptotic power formulas
are often not readily available for certain tests or are too restrictive
in their underlying assumptions to be of much use in practice. The Stata
package **powersim** exploits the flexibility of a simulation-based
approach by providing a facility for automated power simulations in the
context of linear and generalized linear regression models. The package
supports a wide variety of uni- and multivariate covariate distributions
and all family and link choices that are implemented in Stata's **glm**
command. The package mainly serves two purposes: First, it provides access
to simulation-based power analyses for researchers without much experience
in simulation studies. Second, it provides a convenient simulation facility
for more advanced users who can easily complement the automated data
generation with their own code for creating more complex synthetic datasets.
The presentation will discuss some advantages of the simulation-based power
analysis approach and will go through a number of worked examples to
demonstrate key features of the package.

**Additional information**

nola13-luedicke.pdf

nola13-luedicke.pdf

Randall Campbell

Mississippi State University

R. Carter Hill

Louisiana State University

We use Stata to obtain the linear maximum entropy estimator developed by
Golan, Judge, and Miller (1996). We use the Stata **optimize** function to
illustrate maximum entropy estimation in an unrestricted linear regression
model. Next we estimate the model with parameter inequality restrictions to
replicate the Monte Carlo experiments in Campbell and Hill (2006). We
generate data under varying design characteristics and estimate the
parameters using maximum entropy and least squares estimation, both with and
without parameter inequality restrictions.

References

Campbell, R. C. and R. C. Hill. 2005. A Monte Carlo study of the effect of design characteristics on the inequality restricted maximum entropy estimator.*Review of Applied Economics* 1: 53–84.

Golan, A., G. Judge, and D. Miller. 1996.*Maximum Entropy Econometrics: Robust Estimation with Limited Data*. Chichester, UK: John Wiley & Sons.

**Additional information**

nola13-campbell.pdf

References

Campbell, R. C. and R. C. Hill. 2005. A Monte Carlo study of the effect of design characteristics on the inequality restricted maximum entropy estimator.

Golan, A., G. Judge, and D. Miller. 1996.

nola13-campbell.pdf

Yulia Marchenko

Director of Biostatistics, StataCorp

Stata 13's new **power** command performs power and sample-size analysis.
The **power** command expands the statistical methods that were
previously available in Stata's **sampsi** command. I will demonstrate
the **power** command and its additional features, including the support
of multiple study scenarios and automatic and customizable tables and
graphs.

**Additional information**

nola13-marchenko.pdf

nola13-marchenko.pdf

Rodrigo Taborda

Universidad de los Andes, Colombia

Teaching and learning statistics and econometrics requires assessment
through a problem set (PS). Often the PS requires some statistical analysis
of a single database; therefore, there is a unique answer. Although a unique
answer guarantees the exercise was done correctly, it also facilitates
cheating; the lazy student may borrow the answer from his hardworking
classmate. This scenario does not guarantee an honest effort and learning.
Taking advantage of the automatic generation of documents (Gini and Pasquini
2006) for a unique PS, I generate a personalized subdatabase and answer in
a PDF file. Here are the steps. 1) There is a single PS for all
students (implying the use of Stata). 2) There is a single / mother
database. 3) A personalized (per student) database is drawn from the mother
database. 4) Following Gini and Pasquini (2006), a personalized (per student)
answer is generated into a PDF file. Pros: 1) No opportunity for cheating
and copying and pasting the same answer without actually running or
undertaking the statistical procedure. 2) Lecturer knows the answer
beforehand. 3) Ease of grading. 4) Because each student has a different
statistical result, forces to undertake individual inference upon the
results.

Reference

Gini, R. and J. Pasquini. 2006. Automatic generation of documents.*Stata Journal* 6:22–39.

**Additional information**

nola13-taborda.pdf

Reference

Gini, R. and J. Pasquini. 2006. Automatic generation of documents.

nola13-taborda.pdf

Richard Ball

Norm Medeiros

Haverford College

This presentation will describe a protocol we have developed for teaching
students conducting empirical research to document their work in such a way
that their results are completely reproducible and verifiable. The protocol
is composed primarily of creating and assembling a collection of electronic
documents--including raw data files, do-files, and metadata files. The
guiding principle is that an independent researcher, using only the data and
information contained in these files, should be able to replicate every step
of the data management and analysis that generated their empirical results.
Students in our introductory statistics classes, as well as our senior
advisees, have had a great deal of success using the protocol to document
the data processing and analysis involved in their research papers and
theses. There is a great deal of evidence (see Ball and Medeiros [2012] and
McCollough and McKitrick [2009]) that, across the social sciences,
professional norms and common practices with respect to documenting
empirical research are deficient. We hope that teaching good practices to
our students will help strengthen the professional norm that researchers
have an ethical responsibility to ensure that their statistical results can
be independently replicated.

References

Ball, R. and N. Medeiros. 2012. Teaching integrity in empirical research: A protocol for documenting data management and analysis.*Journal of Economic Education* 43: 182–189.

McCoullough, B. D. and R. McKitrick. 2009. Check the numbers: The case for due diligence in policy formation. Fraser Institute for Economic Studies in Risk and Regulation.

http://www.pages.drexel.edu/~bdm25/DueDiligence.pdf

**Additional information**

nola13-ball.pdf

References

Ball, R. and N. Medeiros. 2012. Teaching integrity in empirical research: A protocol for documenting data management and analysis.

McCoullough, B. D. and R. McKitrick. 2009. Check the numbers: The case for due diligence in policy formation. Fraser Institute for Economic Studies in Risk and Regulation.

http://www.pages.drexel.edu/~bdm25/DueDiligence.pdf

nola13-ball.pdf

Choonjoo Lee

Korea National Defense University

In this presentation, we present a procedure and an illustrative application
of user-written mathematical optimization programs: linear programming(LP)
and mixed integer linear programming(MILP). The LP and MILP programs in
Stata will allow researchers to easily access the Stata system and to
conduct not only the statistical optimization procedure but also
mathematical optimization. Unfortunately, to date, no mathematical
programming options for optimization are available in Stata, but a
statistical model is available. The user-written mathematical optimization
approach in Stata will provide some possible future extensions of
nonparametric optimization programming.

**Additional information**

nola13-lee.pptx

nola13-lee.pptx

George Vega

Chilean Pension Supervisor

Inspired in the R library "snow" and to be used in multicore CPUs, PARALLEL
implements parallel computing methods through an OS's shell scripting (using
Stata in batch mode) to accelerate computations. By splitting the dataset
into a determined number of clusters, this module repeats a task
simultaneously over the data clusters, allowing an increase in efficiency
between two and five times, or more depending on the number of cores of the
CPU. Without the need of StataMP, PARALLEL is, to my knowledge, the first
user-contributed Stata module to implement parallel computing.

**Additional information**

nola13-vega.pdf

nola13-vega.pdf

Joseph Canner

Eric Schneider

Johns Hopkins University School of Medicine

As with most programming languages, there are multiple ways to do a task
in Stata. Using modern CPUs with adequate memory, most Stata data processing
commands run so quickly on small- or moderate-sized datasets that it is
impossible to tell whether one command performs more efficiently than
another. However, when one analyzes large datasets such as the Nationwide
Inpatient Sample (NIS), with about 8 million records per year (~3.5GB), this
choice can make a substantial difference in performance. Using the Stata
timer command, we performed standardized benchmarks of common programming
tasks, such as searching the NIS for a list of ICD-9 codes, converting
string data to numeric and putting numeric variables in categories. For
example, the **inlist** function can achieve significant performance gains
compared with using the equivalent **"var==exp1 | var==exp2"** notation
(38% improvement) or using **foreach** loops (300% improvement). Using
the **real** and **subinstr** functions to remove characters from
strings and convert them to numbers is about 20 times faster than the
**destring** command. The **inlist**, **inrange**, and
**recode** functions also perform considerably better than the equivalent
**recode** commands (13 to 70 times faster), especially for string
variables, and are often easier to write and to read.

**Additional information**

nola13-canner.pptx

nola13-canner.do

nola13-canner.pptx

nola13-canner.do

James Fiedler

Universities Space Research Association

At last year's Stata Conference, I presented some ideas for combining
Stata and Python within a single interface. Two methods were presented: in
one, Python was used to automate Stata; in the other, Python was used to
send simulated keystrokes to the Stata GUI. The first method has the
drawback of only working in Windows, and the second can be slow and subject
to character input limits. In this presentation, I will demonstrate a method
for achieving interaction between Stata and Python that does not suffer
these drawbacks, and I will present some examples to show how this
interaction can be useful.

**Additional information**

nola13-fiedler.pdf

nola13-fiedler.pdf

Phil Ender

UCLA Statistical Consulting Group

Measurement invariance is a very important requisite in multiple group
structural equation modeling. It attempts to verify that the factors are
measuring the same underlying latent constructs within each group. This
presentation will show the use of the **sem** command in assessing six
types of factor invariance: configurational, metric, strong, strict, strict
plus factor means, and strict plus factor means and variances. These six
types of factor invariance constitute a hierarchy with each level
representing a stricter definition of factor invariance.

**Additional information**

nola13-ender.pdf

nola13-ender.pdf

Allison Dunning

Sean Collins

Dan Fitzgerald

Sandra H. Rua

Weill Cornell Medical College

Background: Previous trial has shown that starting ART therapy earlier
("Early") rather than waiting for onset of symptoms ("Standard") in
HIV patients significantly decreases mortality. As a follow-up, researchers
are interested in determining if "Early" therapy significantly decreases
time to first tuberculosis (TFTB) diagnosis when adjusting for CD4 cell
count, a known strong predictor. Methods: Stata 12.0 was used to perform two
Cox regression models to analyze the effect of ART start time on TFTB. The
first model included baseline CD4 cell count only as a predictor, while the
second model treated CD4 cell count as a time-varying predictor. Results:
Regular Cox regression analysis showed that "Early" therapy results in a
significant decrease in TFTB after adjustment for previous TB diagnosis,
baseline BMI, and baseline CD4 cell count. Treating CD4 cell count as a
time-varying predictor in Cox regression, we determine that ART start time
was not a significant predictor of TFTB. Conclusions: Failing to adjust for
the change in CD4 cell counts over time led to reporting that "Early"
therapy significantly reduces the risk of TB diagnosis. Modeled correctly, the
effect becomes nonsignificant. This result has substantial consequence on
treatment decisions.

**Additional information**

nola13-dunning.pptx

nola13-dunning.pptx

Seth Lirette

University of Mississippi Medical Center

Fractals are some of the most beloved and recognizable mathematical objects
studied. They have been traced as far back as Leibniz, but failed to receive
rigorous examination until the mid-twentieth century with the many
publications of Benoit Mandelbrot and the advent of the modern computer.
The powerful programming environment of Mata, in tandem with Stata’s
excellent graphics capabilities, provides a very well-suited setting for
generating fractals. My talk will focus on using Mata, combined with Stata,
to generate some visually recognizable fractals, possibly including, but not
limited to, iterated function systems (Barnsley Fern, Koch Snowflake, Gosper
Island); escape-time fractals (Mandelbrot Set, Julia Sets, Burning Ship);
finite subdivisions (Cantor Set, Sierpinski Triangle); Lindenmayer systems
(Dragon Curve, Levy Curve); and strange attractors (Double-scroll, Rossler,
Lorenz).

**Additional information**

nola13-lirette.pptx

nola13-lirette.pptx

Timothy Brophy

University of Cape Town

GPS coordinates are collected by many organizations; however, in order to
derive any meaningful statistical analysis from these coordinates, they need
to be joined with geographical data. Previously, users were required to
export the GPS data out of Stata and into a GIS mapping program to map the
coordinates, validate them and join them to an attribute table. The results
would then need to be imported back into Stata for statistical
analysis. gpsmap is a routine that imports a user-provided shapefile and
its attribute table. Using a ray-casting algorithm, it maps the GPS
coordinates to one of the polygons of the given shapefile and returns a
dummy variable indicating whether the GPS coordinates were mapped
successfully. Where the GPS coordinates were successfully mapped, the
attribute table applicable to that particular polygon is also returned to
Stata. One of the contributions of gpsmap is to allow users to circumvent
GIS software and to incorporate GIS information directly within Stata. The
other is to give users who are not familiar with GIS software the
opportunity to use GIS information without having to familiarize themselves
with GIS software.

**Additional information**

nola13-brophy.pptx

nola13-brophy.pptx

Michael Barker

Georgetown University

Klein and Vella (2010) propose an estimator to fit a triangular system of
two simultaneous linear equations with a single endogenous regressor. Models
of this form are generally analyzed with two-stage least squares or IV
methods, which require one or more exclusion restrictions. In practice, the
assumptions required to construct valid instruments are frequently difficult
to justify. The KV estimator does not require an exclusion restriction; the
same set of independent variables may appear in both equations. To account
for endogeneity, the estimator constructs a control function using
information from the conditional distribution of the error terms.
Conditional variance functions are estimated semiparametrically, so
distributional assumptions are minimized. I will present my Stata
implementation of the semiparametric control function estimator,
**kvreg**, and discuss the assumptions that must hold for consistent
estimation. The **kvreg** estimator contains an undocumented
implementation of Ichimura’s (1993) semiparametric least squares
estimator, which I plan to fillout into a stand-alone command.

References

Klein, R. and F. Vella. 2010. Estimating a class of triangular simultaneous equations models without exclusion restrictions.*Journal of Econometrics* 154: 154–164.

Ichimura, H. 1993. Semiparametric least squares (SLS) and weighted SLS estimation of single-index models.*Journal of Econometrics* 58: 71–120.

**Additional information**

nola13-barker.pdf

References

Klein, R. and F. Vella. 2010. Estimating a class of triangular simultaneous equations models without exclusion restrictions.

Ichimura, H. 1993. Semiparametric least squares (SLS) and weighted SLS estimation of single-index models.

nola13-barker.pdf

Jeff Pitblado

Director of Statistical Software, StataCorp

Introducing generalized SEM: (1) SEM with generalized linear response
variables, and (2) SEM with multilevel mixed effects, whether linear or
generalized linear. Generalized linear response variables mean you can now
fit probit, logit, Poisson, multinomial logistic, ordered logit, ordered
probit, and other models. They also mean measurements can be continuous,
binary, count, categorical, and ordered. Multilevel mixed effects mean you
can place latent variables at different levels of the data. You can fit
models with fixed or random intercepts and fixed or random slopes. I will
present examples using both command syntax and the SEM Builder.

**Additional information**

nola13-pitblado.pdf

nola13-pitblado.pdf

R. Carter Hill, (chair) Louisiana State UniversityMario Cleves , University of Arkansas for Medical Sciences

Edward Peters, LSUHSC School of Public Health

Nathan Bishop, StataCorpChris Farrar, StataCorp

Gretchen Farrar, StataCorp