## Abstracts

### Regression analysis of censored data using pseudoobservations

** Additional materials:**

de14_parner.pdf

### Using Stata for sequence analysis

Sequence analysis (SA) is a very different way of looking at categorical longitudinal data, such as life-course or labor-market histories (or any ordered categorical data, for that matter). Instead of focusing on transition rates (for example, via hazard rate, Markov, or panel models), SA takes individual time series and compares them as a whole. It has significant advantages at a descriptive and exploratory level and may help detect patterns that conventional methods overlook. As availability of longitudinal data increases, this becomes a significant advantage.

SA hinges on defining measures of similarity between sequences, typically to generate data-driven classifications, for example by cluster analysis. Most SA uses the optimal matching distance, but other measures are used. There is some controversy about the applicability of SA algorithms to social science data and about their parameterization. Comparison of different methods and parameterizations helps clarify the issues.

For a long time, TDA was the only package social scientists had access to for SA, but in more recent years, both Stata and R have had relevant functionality; in Stata’s case, this is provided by the SQ and SADI packages.

In this talk, I will discuss the current state of the SADI package. SADI
differs from SQ that is based on a plugin; therefore, it is significantly
faster: many of the distance measures are computationally intensive, and typically,
**_N*(_N-1)/2** comparisons will be made. It also provides additional distance
measures, including dynamic hamming, time-warp edit distance, and a version
of Elzinga’s number of matching subsequences measure. It includes tools for
inspecting and graphing sequence data and for comparing distance measures
and the resulting cluster analyses.

I will also briefly discuss the advantages and disadvantages of using plugins rather than Mata and comment about cross-compiling plugins under Linux.

** Additional materials:**

de14_halpin.pdf

### Some examples using gsem to handle endogeneity in nonlinear models

**gsem**to estimate the parameters.

** Additional materials:**

de14_drukker_gsem.pdf

### Managing Stata-related files with dirtools

**dirtools**package.

**dirtools**is a collection of programs designed to deal with native Stata files (that is, .dta, .do, .ado, .mata, or .gph) and some of the more frequently generated file formats (.eps, .pdf, and .tex). The programs provide easy access to typical tasks of these file types. For example, the user can describe or load datasets, compile Mata files, translate .gph files to .eps or .pdf files, and compile TeX files. In addition, the package allows an easy way to change the working directory and to define bookmarks for frequently used directories.

** Additional materials:**

de14_kohler.pdf

### Splitting spells in very large or daily datasets

**splitspells**, splits spell data in such a way that every split spell within the same case is either parallel or unique (in respect to the time period covered by the spells). This could be done by splitting each spell in single time-unit splits, a method that is not recommended—or even feasible—for large datasets or daily data, because it produces a maximum of additional records. In contrast,

**splitspells**does the job by producing the least possible number of additional records and is a useful tool in the process of transforming multiple spell-type datasets into “unidimensional” sequence data. The second program,

**combispells**, is used with episode data that are already split. Using existing spell-type variables and user-defined short labels,

**combispells**produces a (labeled) numeric variable and a string variable that show the spell-type combinations occurring within parallel splits of a case. Thus it provides a user-friendly and easy-to-handle tool for the editing, revision, and exploration of spell data.

** Additional materials:**

de14_erhardt_kuenster.pdf

### Modeling interactions in count-data regression: Principles and implementation in Stata

During the past decades, count-data models (in particular, Poisson and negative-binomial-based regression models) have gained relevance in empirical social research. While identifying and interpreting main effects is relatively straightforward for this class of models, the integration of interactions between predictors proves to be complex. As a consequence of the exponential mean function implemented in count-data models (which restricts the possible range of the conditional expected count to nonnegative values), the coefficient of the product term variable (generated by the predictors constituting the interaction) does not—in contrast to the linear model—fully represent the underlying interaction effect. Further, the interaction effect is allowed to vary between individuals and can be divided into two components: a model-inherent interaction effect and a product-term-induced interaction effect.

We will derive the total interaction effect for the Poisson and negative binomial models by following a method developed by Norton and Ai (2003) for binary logit and probit models. Further, we will decompose the model-inherent and the product-term-induced interaction effect, discuss their substantive meaning, and provide delta-method standard errors for the respective effects. Finally, we will provide an approach for the estimation and graphical representation of these effects in Stata.

**Reference**

- Ai, C. and E. C. Norton. 2003.
Interaction terms in logit and probit models.
*Economics Letters*. 80: 123–129.

** Additional materials:**

de14_leitgoeb.pdf

### Estimating average treatment effects from observational data using teffects

**teffects**.

** Additional materials:**

de14_drukker_teintro.pdf

### Reproducible research in Stata

Writing a document that contains statistical results in its narrative, including inline results, can take too much effort. Typically, users have a separate series of do-files whose results must then be pulled into the document. This is a very high-maintenance way to work in because updates to the data, changes to the do-files, updates to the statistical software, and, especially, updates to inline results all require work and careful checking of results.

Reproducible research greatly lessens document-maintenance chores by putting code and results directly into the document; this means that only one document is used; thus it remains consistent and is easily maintained.

In this presentation, I will show you how to put Stata code directly into a LaTeX or HTML document and run it through a preprocessor to create the document containing results. While this is useful for creating self-contained documents, it is very useful for creating periodic reports, class notes, solution sets, and other documents that get used over a long period of time.

** Additional materials:**

de14_rising.pdf

### A new command for plotting regression coefficients and other estimates

**marginsplot**command. However, while

**marginsplot**is very versatile and flexible, it has two major limitations: it can process only results left behind by

**margins**, and it can handle only one set of results at a time. In this presentation, I will introduce a new command called

**coefplot**, which overcomes these limitations. It plots results from any estimation command and combines results from several models into a single graph. The default behavior of

**coefplot**is to plot markers for coefficients and horizontal spikes for confidence intervals. However,

**coefplot**can also produce various other types of graphs. The capabilities of

**coefplot**are illustrated using a series of examples.

** Additional materials:**

de14_jann.pdf