2007 German Stata Users Group meeting

Home / Resources & support / User Group meetings / 2007 German Stata Users Group meeting

Last updated: 13 April 2007

2007 German Stata Users Group meeting

Monday, 2 April 2007

RWI Essen
Hohenzollernstr. 1-3
45128 Essen
Germany

Materials documenting the meeting

Proceedings

Why should you become a Stata programmer?

Kit Baum

Boston College Economics

In this talk I describe three modes of Stata programming: authoring do-files, ado-files, and Mata subroutines for ado-file programming. I discuss the advantages of developing skills in Stata programming that will help you become more efficient in your use of Stata and generate fully reproducible research output.

Additional information
StataProgDESUG.7323.pdf

Making regression tables simplified

Ben Jann

ETH Zurich

estout, introduced by Jann (2005), is a useful tool for producing regression tables from stored estimates. However, its syntax is relatively complex and commands may turn out lengthy even for simple tables. Furthermore, having to store the estimates beforehand can be a bit cumbersome. To facilitate the production of regression tables, I therefore present two new commands called esto and esta. esto is a wrapper for official Stata’s estimates store and simplifies the storing of estimation results for tabulation. For example, esto does not require the user to provide names for the stored estimation sets. esta, on the other hand, is a wrapper for estout and simplifies compiling nice-looking tables from the stored estimates without much typing. Basic applications of the commands and usage of esta with external software such as LaTeX, Word, or Excel will be illustrated by a range of examples.

Additional information
Essen07_jann.pdf
Essen07_jann.zip

Assessing the resonableness of an imputation model

Maarten L. Buis

Vrije Universiteit Amsterdam

Multiple imputation is a popular way of dealing with missing values under the missing at random (MAR) assumption. Imputation models can become quite complicated, for instance, when the model of substantive interest contains many interactions or when the data originate from a nested design. This paper will discuss two methods to assess how plausible the results are. The first method consists of comparing the point estimates obtained by multiple imputation with point estimates obtained by another method for controlling for bias due to missing data. Second, the changes in standard error between the model that ignores the missing cases and the multiple imputation model are decomposed into three components: changes due to changes in sample size, changes due to uncertainty in the imputation model used in multiple imputation, and changes due to changes in the estimates that underlie the standard error. This decomposition helps in assessing the reasonableness of the change in standard error. These two methods will be illustrated with two new user written Stata commands.

Additional information
BUIS_GsugBuis.pdf

The influence of categorizing survival time on parameter estimates in a Cox model

Anika Buchholz

University of Freiburg

Willi Sauerbrei

University Medical Center Freiburg

Patric Royston

MRC Clinical Trials Unit, London

With longer follow-up times, the proportional hazards assumption is questionable in the Cox model. Cox suggested to include an interaction between a covariate and a function of time. To estimate such a function in Stata, a substantial enlargement of the data is required. This may cause severe computational problems. We will consider categorizing survival time, which raises issues as to the number of cutpoints, their position, the increased number of ties, and the loss of information, to handle this problem. Sauerbrei et al. (2007) proposed a new selection procedure to model potential time-varying effects. They investigate a large dataset (N = 2982) with 20 years follow-up, for which the Stata command stsplit creates about 2.2 million records. Categorizing the data in 6-month intervals gives 35,747 records. We will systematically investigate the influence of the length of categorization intervals and the four methods of handling ties in Stata. The results of our categorization approach are promising, showing a sensible way to handle time-varying effects even in simulation studies. References: Sauerbrei, W., Royston, P. and Look, M. (2007). A new proposal for multivariable modelling of time-varying effects in survival data based on fractional polynomial time-transformation. (Biometrical Journal, in press)

Additional information
BUCHHOLZ_Vortrag.Essen.pdf

Oaxaca/Blinder decompositions for nonlinear models

Matthias Sinning

Markus Hahn

RWI Essen, University of Bochum

This paper describes the estimation of a general Blinder–Oaxaca decomposition of the mean outcome differential of linear and nonlinear regression models. Departing from this general model, we show how it can be applied to different models with discrete and limited dependent variables.

Additional information
SINNING_stata_presentation.pdf

Estimating double-hurdle models with dependent errors and heteroskedasticity

Julian A. Fennema

Heriot-Watt University, Edinburgh

This paper describes the estimation of the parameters of a double-hurdle model in Stata. It is shown that the independent double-hurdle model can be estimated using a combination of existing commands. Likelihood evaluators to be used with Stata’s ml facilities are derived to illustrate how to fit independent and dependent inverse hyperbolic sine double-hurdle models with heteroskedasticity.

Measuring richness

Andreas Peichl

University of Cologne

In this paper, we describe richness, a Stata program for the calculation of richness indices. Peichl, Schaefer, and Scheicher (2007) propose a new class of richness measures to contribute to the debate how to deal with the financing problems that European welfare states face as a result of global economic competition. In contrast to the often-used head count, these new measures are sensitive to changes in rich persons’ income. This approach allows for a more sophisticated analysis of richness, namely, the question whether the gap between rich and poor is widening. We propose to use our new measures in addition to the head count index for a more comprehensive analysis of richness.

Additional information
peichl_20070402_VortragStataUserGroup.pdf

Robust income distribution analysis

Philippe Van Kerm

CEPS/INSTEAD, Luxembourg

Extreme data are known to be highly influential when measuring income inequality from microdata. Similarly, Lorenz curves and dominance criteria are sensitive to data contamination in the tails of the distribution. In this presentation, I intend to introduce a set of user-written packages that implement robust statistical methods for income distribution analysis. These methods are based on the estimation of parametric models (Pareto, Singh–Maddala) with “optimal B-robust” estimators rather than maximum likelihood. Empirical examples show how robust inequality estimates and dominance checks can be derived from these models.

Additional information
VANKERM_gsum_slides.pdf

PanelWhiz: A Stata interface for large scale panel datasets

John P. Haisken-DeNew

RWI Essen

This paper outlines a panel-data retrieval program written for Stata/SE or better, which allows easier accessing of the household panel datasets. Using a dropdown menu system, the researcher selects variables from any and all available years of the panel. The data are automatically retrieved and merged to form a long file, which can be directly used by the Stata panel estimators. The system implements modular data cleaning programs called plugins. Yearly updates to the data retrievals can be made automatically. Projects can be stored in libraries, allowing modular administration and appending. PanelWhiz is available for SOEP, IAB-Betriebspanel, HILDA, CPS-NBER, CPS-CEPR. Other popular datasets will be supported soon.

Additional information
HAISKEN_panelwhiz_overview.ppt

PanelWhiz plugins: automatic vector-oriented data cleaning for large scale panel datasets

Markus Hahn

RWI Essen and University of Bochum

PanelWhiz plugins are modular data-cleaning programs for specific items in PanelWhiz. Each plugin is designed to recode, deflate, and change existing variables being extracted in a panel-data retrieval. Furthermore, new variables can be generated on the fly. The PanelWhiz plugin system is a macro language that uses new-style dialog boxes and Stata’s modularized class system, allowing a vector orientation for data cleaning. The PanelWhiz plugins can even be generated using a PanelWhiz plugin front-end, allowing users to create plugins but not have to write Stata code themselves. The system is set up to allow data cleaning of any PanelWhiz-supported dataset.

Additional information
HAHN_german_stata2007.pdf

A model for transferring variables between different data-sets based on imputation of individual scores

Bojan Todosijevic

University of Twente

It is an often-encountered problem that variables of interest are scattered in different datasets. Given the two methodologically similar surveys, a question not asked in one survey could be seen as a special case of missing-data problem (Gelman et al., 1998). The paper presents a model for transferring variables between different datasets, applying the procedures for multiple imputation of missing values. The feasibility of this approach was assessed using two Dutch surveys: Social and Cultural Developments in The Netherlands (SOCON 2000) and the Dutch Election Study (NKO 2002). An imputation model for the left–right ideological self-placement was developed based on the SOCON survey. In the next step, left–right scores were imputed to the respondents from the NKO study. The outcome of the imputation was evaluated, first, by comparing the imputed variables with the left–right scores collected in three waves of the NKO study. Second, the imputed and the original NKO left–right variables are compared in terms of their associations with a broad set of attitudinal variables from the NKO dataset. The results show that one would reach similar conclusions when using the original or imputed variable, albeit with the increased risk of making Type II errors.

Additional information
TODOSIJEVIC_Presentation_light.pps

Two issues on remote data access

Peter Jacobebbinghaus

IAB

At the Research Data Centre of the BA at the IAB, researchers can send in Stata programs to be processed there with the log files sent back to them after a disclosure limitation review. This method of data access is called remote data access and the reason we do this is data confidentiality. Remote data access has two nonstandard requirements: efficient use of the computer resources and automation of parts of the disclosure limitation review. I would like to talk about how we deal with these requirements and discuss ways to improve them.

Additional information
JACOBEBBINGHAUS_Stata_Essen-1.ppt

Scientific organizers

Johannes Giesecke, University of Mannheim
[email protected]
Ulrich Kohler, WZB
[email protected]
Fred Ramb, Deutsche Bundesbank
[email protected]

Logistics organizers

The conference is sponsored and organized by Dittrich and Partner (http://www.dpc.de), the distributor of Stata in several countries, including Germany, Austria, and Hungary.