The German Stata Users Group Meeting takes place on 22 June 2018 at the Universität Konstanz. There will also be an optional workshops on 21 June.
The meeting will provide Stata users from across Germany and the world the opportunity to exchange ideas, experiences, and information on new applications of the software. Everybody who is interested in using Stata is welcome.
The conference language will be English because of the international nature of the meeting and the participation of nonGerman guest speakers.
Program: Friday, 22 June
9:00–9:15  Registration 
9:15–9:30  Welcome 
9:30–10:30 
Abstract:
Being a Stata User since Stata 3, I have witnessed a number of developments
over the years. Some of them, such as Stage or the gph commands, turned out
to be dead ends, while others, such as syntax, have been hidden for
many users, but shaped Stata strongly. Users still use some dead ends
("for"). Some developments made
buzz in the public but never gained much attention in (my own)
practice. Some developments were introduced in passing, but took off immediately
as a workhorse in my daily work (web awareness). I give a subjective
review of Stata's development by listing the dead ends and the milestones.
I speculate about reasons why dead ends became dead ends, and
why milestones became milestones. My intention is to start
a discussion about what German users like and dislike about Stata.
Ulrich Kohler
Universität Potsdam

10:30–11:00 
Abstract:
The overall look of Stata's graphs is determined by socalled scheme files.
Scheme files are system components, that is, part of the local
Stata installation. In this presentation, I will argue that style settings deviating
from default schemes should be part of the script producing the graphs rather
than being kept in separate scheme files, and I will present software that
supports such a practice. In particular, I will present a command, grstyle,
that allows users to quickly change the overall look of graphs without having
to fiddle around with external scheme files. I will also present a command,
colorpalette, that provides a wide variety of color schemes for use in
Stata graphics.
Ben Jann
University of Bern

11:00–11:30 
Abstract:
Structural equation modeling is well established in the statistician's
standard toolkit. To establish how well latent constructs are measured
by their respective observed indicators, many applications entail confirmatory
factor analysis (CFA). The appropriateness of a particular CFA model in
turn is assessed by various statistics such as chisquared or socalled fit
indices. What these indices have in common is their reliance on a comparison
with the estimated model with a baseline or null model that imposes various
restrictions. While the default baseline model (for example, the "independence model")
is appropriate for common singlegroup and singletimepoint situations,
several authors argue that researchers should specify alternative baseline
models in multiplegroup or longitudinal applications (for example, Little, 2013;
Widaman & Thompson, 2003). Focusing on longitudinal data, this presentation
accordingly illustrates how to specify appropriate baseline models and compute
corresponding goodnessoffit statistics in Stata.
References: Little, T. D. 2013. Longitudinal structural equation modeling. New York, NY: Guilford Press. Widaman, K. F., and Thompson, J. S. 2003. On specifying the null model for incremental fit indices in structural equation modeling. Psychological methods 8,1: 16–37. Sven O. Spieß
Dittrich & Partner Consulting

11:30–11:45  Coffee 
11:45–12:15 
Abstract:
swapgpsxy interchanges GPS coordinates given that both the xvar and yvar
variables representing the longitude and latitude respectively are of
numeric data types. swapgpsxy is useful whenever summary statistics of
the GPS coordinates suggest coordinates are interchanged. swapgpsxy can
be applied unconditionally, when the geographical area is relatively
uniform and small, for example, the State of Qatar. On the other hand,
swapgpsxy can be applied conditionally using either if or in, but both
cannot be included in a single expression. This is useful when the geographical
area is large and the terrain differs per province or zone, for example,
the Republic of South Africa. Given the presence of interchanged GPS
coordinates in our data, we apply swapgpsxy to correct the error. Using
the median absolute deviation (MAD) method, we find that outliers in GPS
coordinates are detected and interchanged correctly. Based on the results,
we suggest swapgpsxy as a useful tool for improving data quality, particularly
when data management is prone to human error.
Brian W. Mandikiana
Qatar University

12:15–12:45 
Abstract:
Text data, such as answers to openended questions, are sometimes ignored
because they are hard to analyze. Our communitycontributed Stata command,
ngram, turns text into hundreds of variables using the "bag of words"
approach. Broadly speaking, each variable records how often the
corresponding word or word sequence occurs in a given text. This is more
useful than it sounds. The program supports text in 12 European languages.
Matthias Schonlau
University of Waterloo

12:45–1:45  Lunch 
1:45–2:15 
Abstract:
At the 2017 meeting, I talked about efficient programming with regards to
optimal lag selection for autoregressive distributed lag (ARDL) models as
implemented in the communitycontributed Stata command ardl (Kripfganz and Schneider
2016). I will expand on last year's presentation by focusing on a
second nontrivial computational aspect of ardl: the simulation of critical
values for the Pesaran, Shin, and Smith (2001)
boundstesting procedure for a longrun relationship. Up until recently, only
a limited set of critical values was available. I will illustrate the
programming behind Kripfganz and Schneider's (2018) comprehensive and more
precise set of critical values and approximate pvalues, which have been made
available in Stata as a postestimation feature of ardl. I explain the
calculation, storage, and processing of 160 billion simulated F or tstatistics.
Topics covered will include pointer variables, LAPACK functions in Mata, using
variable transformations in conjunction with Stata's various numeric data types
for efficient storage, random number streams, and strategies for using several
instances of Stata simultaneously.
References: Kripfganz, S, and D. C. Schneider. 2016. ardl: Stata module to estimate autoregressive distributed lag models. paper presented at the Stata Conference, Chicago, Il, July 2016. Kripfganz, S, and D. C. Schneider. 2017. A case study in efficient programming in Stata and Mata: Speeding up the ardl estimation command. Paper presented at the German Stata Users Group Meeting, Berlin, June 2017. Kripfganz, S, and D. C. Schneider. 2018. Response surface regressions for critical value bounds and approximate pvalues in equilibrium correction models. Manuscript, University of Exeter and Max Planck Institute for Demographic Research. Available at http://www.kripfganz.de/research/Kripfganz_Schneider_ec.html. Pesaran, M. H., Y. Shin, and R. J. Smith. 2001. Bounds testing approaches to the analysis of level relationships. Journal of Applied Econometrics 16: 289–326. Daniel C. Schneider
Max Planck Institute for Demographic Research

2:15–2:45 
Abstract:
Traditional fit measues based on noncentral chisquare distribution (RMSEA,
TLI, or CFI) tend to overreject acceptable models when the sample size is small
(n <g; 100). My adofile, swain_gof.ado, corrects the likelihood ratio chisquare
goodnessoffit test statistic for structural equation models. This chisquare statistic
is asymptotically correct, but it does not behave as expected in small samples
or when the model is complex (Herzog, Boomsma, and Reinecke 2007). Particularly
in situations where the ratio of sample size to the number of parameters estimated
is relatively small, such as 5:1 (Bentler and Chou 1987), the chisquare test will
tend to overreject correctly specified models. To obtain a closer approximation
to the distribution of the chisquare statistic, Swain (1975) developed a correction.
His scaling factor, which converges asymptotically to 1 by increasing sample size,
is multiplied with the chisquare statistic. This correction better approximates the
noncentral chisquare distribution resulting in more appropriate type 1 reject error
rates (see Herzog & Boomsma, 2009; Herzog, et al. 2007). This works reliabale just
to a sample sizeparameter ratio of 2:1.
My swan_gof.ado calculates the root mean squared error of approximation (RMSEA), the TuckerLewis Index (TLI), and comparative fit index (CFI) using the Swaincorrected chisquare values assuming multinormal distribution of the observed indicators. Violating this assumption, it calculates the fit additionally indices using the SattoraBentler correction. Therefore, you have to use the vce(sbentler) option of the sem command. My swain_gof.ado can be executed after the sem and estat gof, stats(all) as a postestimation command by simply typing swain_gof. It returns the estimated fit indices and scalars as r containers.
A survey example of Islamophobia will be presented to demonstrate the usefulness
of my swain_gof.ado.
Bentler, P.M., and C.P. Chou. 1987. Practical issues in structural equation modeling. Sociological Methods &aamp; Research 16: 78–117. Bentler, P.M., and K.H. Yuan. 1999. Structural equation modeling with small samples: Test statistics. Multivariate Behavioral Research 34: 181–197. Curran, P.J., K.A. Bollen, P. Paxton, J. Kirby, and F.N. Chen. 2002. The noncentral chisquare distribution in misspecified structural equation models: Finite sample results from a Monte Carlo simulation. Multivariate Behavioral Research 37: 1–36. Herzog, W., and W. Boomsma. 2009. Smallsample robust estimators of noncentralitybased and incremental model fit. Structural Equation Modeling 16: 1–27. Herzog, W., W. Boomsma, and S. Reinecke. 2007. The modelsize effect on traditional and modified tests of covariance structures. Structural Equation Modeling 14: 361–90. Satorra, A., and P.M. Bentler. 1994. Corrections to test statistics and standard errors in covariance structure analysis. In Latent variables analysis: Applications for developmental research, edited by Alexander Von Eye and Clifford Clogg, 399–419. Newbury Park, CA: Sage, 1994. Swain, A.J. 1975. Analysis of parametric structures for variance matrices (Doctoral thesis). University of Adelaide, Adelaide. Wolfgang Langer
MartinLutherUniversity HalleWittenberg

2:45–3:15 
Abstract:
In this presentation, I will go through the workflow of creating an interactive
presentation in Stata (a .smcl presentation) with smclpres based on a
small example presentation.
Some talks are primarily on how to do things in Stata, like a lecture on graphs in Stata or a talk at a Stata Users' Group meeting. In those cases, a .smcl presentation can be useful. A .smcl presentation is a series of linked .smcl files that open in the viewer inside Stata (like help files). The strength of a .smcl presentation is that it can contain links that execute examples, open help files, open dofiles, etc. A .smcl presentation is all about illustrating how to do something in Stata, so preparing for such a talk typically starts with preparing a set of examples in a dofile. By adding specific comments to that dofile, for example, to indicate when a slide starts and when it ends, what the title of the slide is, etc., the smclpres command can turn that dofile into a .smcl presentation. Moreover, the pres2html command can turn that .smcl presentation into an HTML handout so that participants can easily access the content after the presentation. Maarten Buis
University of Konstanz

3:15–3:30  Coffee 
3:30–4:00 
Abstract:
The autoexam ado package allows one to automatically generate multiplechoice tests from
a database of items. The tests are optimized with regard to the distribution
of difficulties and the representative coverage of course topics. The tests
can be written as LaTeX or HTML files. Accompanying adofiles help to analyze
items using IRT models and to manage or update the item database. The system
can also be used to generate mock exams to allow students to prepare for the
exam. When creating such mock exams, the user can choose what percentage, if
any, of the real test questions is allowed to occur in the mock exams.
Finally, autoexam allows one to include mathematical or statistical questions in
the item database that are randomly generated with respect to the specific
numbers in the questions. The autoexam adopackage aims to help teachers with
creating and correcting exams more efficiently and with better quality. It is
particularly helpful for large basic courses that are repeated in regular intervals.
Alexander SchmidtCatran
GoetheUniversity Frankfurt

4:00–5:00 
Abstract:
Stata 15 includes three new commands for producing dynamic documents:
dyndoc, putdocx, and putpdf. These commands have
generated much interest in the user community; this has led to a large
amount of communitycontributed software. In this talk, I'll give some
tips about how to use the commands efficiently both with official Stata
software and with some of these communitycontributed tools.
Bill Rising
StataCorp

5:00–5:15  Coffee 
5:15–6:00 
StataCorp

Workshops: Thursday, 21 June
Graphics with Stata
Maarten Buis, Universität Konstanz, 9:00 a.m. to 1:00 p.m.
Description
This workshop is intended for participants who want to make the most out of graphs in Stata. Stata has very powerful graphics language, but with power comes an elaborate syntax with a lot of options. This makes it easy to get lost and overlook useful possibilities. In this workshop we will focus on building your graph step by step, and tips and tricks to create a wide range of informative graphs.
Prerequisites
Basic knowledge of Stata.
Bayesian analysis using Stata
Yulia Marchenko, Executive Director of Statistics, StataCorp, 2:00 p.m. to 6:00 p.m.
Description
This workshop covers the use of Stata to perform Bayesian analysis. Bayesian analysis is a statistical paradigm that answers research questions about unknown parameters using probability statements. For example, what is the probability that a person accused of a crime is guilty? What is the probability that the odds ratio is between 0.3 and 0.5? And many more. Such probabilistic statements are natural to Bayesian analysis because of the underlying assumption that all parameters are random quantities. In Bayesian analysis, a parameter is summarized by an entire distribution of values instead of one fixed value as in classical frequentist analysis. Estimating this distribution, a posterior distribution of a parameter of interest, is at the heart of Bayesian analysis. This workshop will demonstrate the use of Bayesian analysis in various applications and will introduce Stata's suite of commands for conducting Bayesian analysis.
Prerequisites
Basic knowledge of Stata.