4th German Stata Users' Group Meeting: Announcement and Program
===============================================================
The 4th German Stata Users' Group Meeting will be held at the
University of Mannheim (http://www.wz-berlin.de) on Friday, March 31th 2006.
The content of the meeting has been organized by Johannes Giesecke,
University of Mannheim (jgiesecke@rumms.uni-mannheim.de), Ulrich Kohler,
WZB (kohler@wz-berlin.de), and Fred Ramb, Deutsche Bundesbank
(fred.ramb@bundesbank.de). The logistics are being organized by Dittrich
and Partner (http://www.dpc.de), the distributor of Stata in several countries
including Germany and Austria.
The meeting is open to all interested, and we will be happy if Stata users
from neighboring countries join us. StataCorp will be represented. The
conference language will be English due to the 'international' nature of the
meeting and the participation of non-German guest speakers. There will be a
"wishes and grumbles" session at which you may air your thoughts to Stata
developers. There will also be an optional informal meal at a Mannheim
restaurant on Friday evening (at additional cost of 20 Euro).
Participants are asked to travel on their own fees. There will be a small
conference fee (regular 20 Euro, students 10 Euro) to cover costs for coffee,
teas, and luncheons.
For further information on registration, please contact anke.mrosek@dpc.de.
Mrs. Mrosek will also assist you in finding an accommodation. For general
information about the meeting see also http://www.stata.com/mannheim06.
Readers of previous announcements should note that the conference venue
has changed to Room W 117, located in the Schloss. You will find an exact plan
of the conference venue on http://www.stata.com/mannheim06.
Note: Counting the number of windows, the Schloss of Mannheim is the
biggest palace in Europe. Even if you don't trust the indicator, believe
us: the Schloss is big. We therefore ask you to plan ample time. It is
not difficult to find the Schloss, in Mannheim, but it probably is
difficult to find the room within the Schloss.
Schedule of the 3rd German Stata Users' Group Meeting
-----------------------------------------------------
8:45 Registration and coffee/tea
9:15 Welcome
Johannes Giesecke
9:30 Resultssets, resultsspreadsheets and resultsplots in Stata
Roger Newson, Imperial College London
r.newson@imperial.ac.uk
Most Stata users make their living producing results in a form
accessible to end users. Most of these end users cannot immediately
understand Stata logs. However, they can understand tables (in paper,
PDF, HTML, spreadsheet or word processor documents) and plots (produced
using Stata or non--Stata software). Tables are produced by Stata as
resultsspreadsheets, and plots are produced by Stata as resultsplots.
Sometimes (but not always), resultsspreadsheets and resultsplots are
produced using resultssets. Resultssets, resultsspreadsheets and
resultsplots are all produced, directly or indirectly, as output by
Stata commands. A resultsset is a Stata dataset, which is a table,
whose rows are Stata observations and whose columns are Stata variables.
A resultsspreadsheet is a table in generic text format, conforming to a
TeX or HTML convention, or to another convention with a column separator
string and possibly left and right row delimiter strings. A resultsplot
is a plot produced as output, using a resultsset or a resultsspreadsheet
as input. Resultsset--producing programs include -statsby-, -parmby-,
-parmest-, -collapse-, -contract-, -xcollapse- and -xcontract-.
Resultsspreadsheet--producing programs include -outsheet-, -listtex-,
-estout- and -estimates table-. Resultsplot--producing programs include
-eclplot- and -mileplot-. There are two main approaches (or dogmas) for
generating resultsspreadsheets and resultsplots. The resultsset--centred
dogma is followed by -parmest- and -parmby- users, and states:
``Datasets make resultssets, which make resultsplots and
resultsspreadsheets''. The resultsspreadsheet--centred dogma is
followed by -estout- and -estimates table- users, and states:
``Datasets make resultsspreadsheets, which make resultssets, which make
resultsplots''. The two dogmas are complementary, and each dogma has
its advantages and disadvantages. The resultsspreadsheet dogma is much
easier for the casual user to learn to apply in a hurry, and is
therefore probably preferred by most users most of the time.
The resultsset dogma is more difficult for most users to learn, but is
more convenient for users who wish to program everything in
do-files, with little or no manual cutting and pasting.
10:20 Coffee
GLLAMM-Session
--------------
10:30 Intervention evaluation using -gllamm-
Andrew Pickles, University of Manchester
(andrew.pickles@manchester.ac.uk)
The gllamm procedure provides a framework within which many of the
more difficult analyses required for trials and intervention studies
may be undertaken.
Treatment effect estimation in the presence of non-compliance can be
undertaken using instrumental variable (IV) methods. We illustrate how
gllamm can be used for IV estimation for the full range of types of
treatment and outcome measures and describe how missing data may be
tackled on an assumption of latent ignorability. Alternative
approaches to account for clustering and the analysis of
cluster-randomised studies will also be described.
Examples from studies of alcohol consumption of primary care patients,
cognitive behaviour therapy of depression patients and a school based
smoking intervention are discussed.
11:20 Estimating IRT models with -gllamm-
Herbert Matschinger, University of Leipzig
(math@medizin.uni-leipzig.de)
Within the framework of economic evaluation, health econometricians
are interested in constructing a meaningful health index that is
consistent with individual or societal preferences. One way to
derive such an index is based on the EQ-5D description and valuation
of health related quality of life (HRQOL). The purpose of this study
was to analyze how well the EQ-5D reflects one latent construct of
HRQOL and how large is the potential impact of measurement variance
with respect to six different countries. Data came from the European
Study of the Epidemiology of Mental Disorders (ESEMeD), a
cross-sectional survey of a representative random sample (N=21,425)
in Belgium, France, Germany, Italy, the Netherlands and Spain. At
least in psychology much attention is paid to different forms of
IRT models and particularly the Rasch model, since it is the only
model featuring specific objectivity which enables what is called a
“fair comparison” with respect to the latent dimension to be measured.
Therefore the dimensionality of the construct is evaluated by means
of one-parameter and two-parameter Item Response Theory (IRT).
Differential Item Functioning is tested with respect to the six
countries and both the difficulty and discrimination parameters.
Results show, that a unidimensional one-parameter IRT model holds
for all countries if only the item “anxiety/depression” is omitted.
If both the physical and the mental component of health related
(HRQOL) should be represented the questionnaire should be extended
to a two-dimensional construct. Consequently, more items to portray
the mental component are then needed. This presentation will focus on
the possibilities and restrictions in estimating these models with
-gllamm-. It will be shown how these models can be established
and tested. Problems regarding the structure of the data and the
assignment of incidental parameters to individual observations will
be discussed.
General Statistics
------------------
11:50 Variance estimation for Generalized Entropy and Atkinson
inequality indices: the complex survey data case
Martin Biewen, University of Frankfurt
(biewen@wiwi.uni-frankfurt.de)
We derive the sampling variances of Generalized Entropy and Atkinson
indices when estimated from complex survey data, and show how they
can be calculated straightforwardly using widely- available software.
We also show that, when the same approach is used to derive variance
formulae for the i.i.d. case, it leads to estimators that are simpler
than those proposed before. Both cases are illustrated with a
comparison of income inequality in Britain and Germany.
12:20 Lunch
13:30 Linear mixed models in Stata
Roberto G. Gutierrez, StataCorp
(rgutierrez@stata.com)
Included with Stata version 9 is the new command xtmixed, for fitting
linear mixed models. Mixed models containing both fixed and random
effects. The fixed effects are analagous to standard regression
coefficients and are estimated directly. The random effects are not
directly estimated but are summarized according to the unique elements
of their respective variance–covariance matrices, known as variance
components. xtmixed syntax is summarized and demonstrated using several
examples. In addition, xtmixed and its postestimation routines may be
used to perform nonparametric smoothing via penalized splines.
User Written Programs
----------------------
14:20 Implementing Restricted Least Squares in Linear Models
J. Haisken-DeNew, RWI Essen
(jhaiskendenew@rwi-essen.de)
The presentation illustrates the user written program -hds97-,
which implements the restricted least squares procedure as
described by Haisken-DeNew and Schmidt (1997). Log wages are
regressed on a group of k-1 industry/region/job/etc dummies.
The k-th dummy is the omitted reference dummy. Using RLS, all
k dummy coefficients and standard errors are reported. The
coefficients are interpreted as percent-point deviations from the
industry weighted average. An overall measure of dispersion is
also reported.
This ado corrects problems with the Krueger and Summers (1988)
Econometrica methodology of overstated differential standard
errors, and understated overall dispersion.
General comments: The coefficients of continuous variables are
not affected by -hds97-. Also, all results calculated in -hds97-
are independent of the choice of the reference category. By the way,
for all dummy variable sets having only two outcomes, i.e. male/female,
the t-values of the hds97 adjusted coefficients are always equal
in magnitude, but opposite in sign.
14:50 Sequence analysis using Stata
Christian Brzinsky-Fay, WZB; Ulrich Kohler, WZB
(brzinsky-fay@wz-berlin.de; kohler@wz-berlin.de)
Sequences are ordered lists of elements. A typical example for a
sequence is the sequence of bases in the DNS of creatures. Other
examples are sequences of employment stages during life time, or
individual party-preferences over time. Sequence analysis include
techniques to handle, describe, and, most importantly, to compare
sequences among each other.
Sequences are most commonly used by scholars of genomes, but far
less by social scientist. This is in so far surprising as sequence
data is readily available in many datasets for the social sciences.
In fact, all data from panel studies can be regarded as sequence data.
Despite that, social scientists relatively seldom use panel data for
sequence analysis. The first aim of the presentation therefore is to
illustrate a typical research topics that can be dealt with sequence
analysis. The second part will then describe a bundle of user written
Stata programs for sequence analysis, including a Mata algorithm for
performing optimal matching with the so called "Needleman-Wunsch"
Algorithm.
15:30 Coffee
15:40 New Tools for Evaluating the Results of Cluster Analyses
Hildegard Schaeper, HIS
(schaeper@his.de)
Clustering methods are designed for finding groups in data, for
grouping similar objects (variables or observations) into the same
cluster and dissimilar objects into separate clusters. Whereas this
main idea is rather simple, carrying out a cluster analysis remains a
challenging task: The number of different clustering methods is huge
and clustering includes many choices, such as the decision between basic
approaches (e. g. hierarchical and partitioning methods), the choice of
a dissimilarity or similarity measure, the selection of a particular
linkage method when performing a hierarchical agglomerative cluster
analysis, the choice of an initial partition when carrying out a
partitioning cluster analysis, and the determination of the
appropriate number of clusters. Each of these decisions and choices
can affect the classification results.
Apart from two commands for determining the number of clusters
(cluster stop, cluster dendrogram) Stata has no inbuilt utilities
which allow to examine clustering results. We, therefore, developed
some simple tools which provide additional evaluation criteria:
– programs assisting in determining the number of clusters
(Mojena’s stopping rules for hierarchical clustering techniques,
PRE coefficient, F-Max statistic and Beale’s F values for
a partitioning cluster analysis),
– a program for testing the stability of classifications produced
by different cluster analyses (Rand index), and
– a program that computes ETA2 in order to assess how well the
clustering variables separate the clusters.
In the presentation these programs will be presented, and their
usefullness will be discussed in comparison with other tools for
the evaluation of clustering results (agglomeration schedule,
scree diagram).
Towards an Open Wish List to StataCorp
--------------------------------------
16:10 Stata goes BUGS (via R)
Susumu Shikano, University of Mannheim
(shikanos@rumms.uni-mannheim.de)
Recently, Bayesian methods such as Markov chain Monte Carlo (MCMC)
techniques find an increasing use in the social sciences, with
(Win)BUGS being one of the most widely applied software for this
kind of analysis. Unfortunately, due to the absence of MCMC
techniques and any interfaces to WinBUGS or BUGS in Stata, Stata
users who apply MCMC techniques have to perform such painful tasks
as reformatting data by themselves. As a preliminary solution to
this problem, one can call another statistical software R from inside
Stata and use it as an interface to (Win)BUGS. This presentation
outlines this solution providing an exemplar analysis.
16:40 Optimal Large Package Administration for Stata
Markus Hahn, RWI Essen
The Stata package tool is quite simple to use for smaller ADO packages
stored on user webpages. However when the number of files in a package
becomes large and the files need to be updated on a regular basis, this
becomes cumbersome. Package updates could take many minutes to complete.
Here a method of storing packages as compressed archives on the host
server is outlined, whereby the user sends a query to the update server
to check for a new version. If a new version is available, the package
archive is downloaded in its entirety, and then extracted and installed
locally. This is far more efficient with respect to installation times
(typically only 1/10 of the time needed) than downloading many text
files individually. For large packages, the bottleneck is most often the
download time. Currently this automated updating can be achieved with a
Stata Ado and the aid of additional binaries (such as tar, gzip, zip).
The usability of this technique would be enhanced dramatically if the
functionality of an archiving format (such as tar, gzip, zip) were
directly integrated into the Stata binary. Even encrpyted files could be
distributed in this manner as well. Ado files inside the package archive
can be configured to make an automatic call to the host server to check
for available updates.
17:10 Coffee
17:20 Report to the users
Alan Riley, StataCorp
17:50 Wishes and Grumpels
18:30 End of the Meeting
--
kohler@wz-berlin.de
+49 (030) 25491-361
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/