The German Stata Users Group Meeting was held on 22 June 2018 at Universität Konstanz, but you can view the program and presentation slides below.
From 3 to 15: Milestones, dead ends, prospects. A subjective review of Stata's history
Abstract: Being a Stata User since Stata 3, I have witnessed a number of developments over the years. Some of them, such as Stage or the gph commands, turned out to be dead ends, while others, such as syntax, have been hidden for many users, but shaped Stata strongly. Users still use some dead ends ("for"). Some developments made buzz in the public but never gained much attention in (my own) practice. Some developments were introduced in passing, but took off immediately as a workhorse in my daily work (web awareness). I give a subjective review of Stata's development by listing the dead ends and the milestones. I speculate about reasons why dead ends became dead ends, and why milestones became milestones. My intention is to start a discussion about what German users like and dislike about Stata.
Customizing Stata graphs made easy
Abstract: The overall look of Stata's graphs is determined by so-called scheme files. Scheme files are system components, that is, part of the local Stata installation. In this presentation, I will argue that style settings deviating from default schemes should be part of the script producing the graphs rather than being kept in separate scheme files, and I will present software that supports such a practice. In particular, I will present a command, grstyle, that allows users to quickly change the overall look of graphs without having to fiddle around with external scheme files. I will also present a command, colorpalette, that provides a wide variety of color schemes for use in Stata graphics.
University of Bern
Specifying appropriate null models with longitudinal CFAs
Abstract: Structural equation modeling is well established in the statistician's standard toolkit. To establish how well latent constructs are measured by their respective observed indicators, many applications entail confirmatory factor analysis (CFA). The appropriateness of a particular CFA model in turn is assessed by various statistics such as chi-squared or so-called fit indices. What these indices have in common is their reliance on a comparison with the estimated model with a baseline or null model that imposes various restrictions. While the default baseline model (for example, the "independence model") is appropriate for common single-group and single-time-point situations, several authors argue that researchers should specify alternative baseline models in multiple-group or longitudinal applications (for example, Little, 2013; Widaman & Thompson, 2003). Focusing on longitudinal data, this presentation accordingly illustrates how to specify appropriate baseline models and compute corresponding goodness-of-fit statistics in Stata.
Little, T. D. 2013. Longitudinal structural equation modeling. New York, NY: Guilford Press.
Widaman, K. F., and Thompson, J. S. 2003. On specifying the null model for incremental fit indices in structural equation modeling. Psychological methods 8,1: 16–37.
Sven O. Spieß
Dittrich & Partner Consulting
swapgpsxy: A tool for interchanging GPS coordinates
Abstract: swapgpsxy interchanges GPS coordinates given that both the xvar and yvar variables representing the longitude and latitude respectively are of numeric data types. swapgpsxy is useful whenever summary statistics of the GPS coordinates suggest coordinates are interchanged. swapgpsxy can be applied unconditionally, when the geographical area is relatively uniform and small, for example, the State of Qatar. On the other hand, swapgpsxy can be applied conditionally using either if or in, but both cannot be included in a single expression. This is useful when the geographical area is large and the terrain differs per province or zone, for example, the Republic of South Africa. Given the presence of interchanged GPS coordinates in our data, we apply swapgpsxy to correct the error. Using the median absolute deviation (MAD) method, we find that outliers in GPS coordinates are detected and interchanged correctly. Based on the results, we suggest swapgpsxy as a useful tool for improving data quality, particularly when data management is prone to human error.
Brian W. Mandikiana
Text mining with ngram variables
Abstract: Text data, such as answers to open-ended questions, are sometimes ignored because they are hard to analyze. Our community-contributed Stata command, ngram, turns text into hundreds of variables using the "bag of words" approach. Broadly speaking, each variable records how often the corresponding word or word sequence occurs in a given text. This is more useful than it sounds. The program supports text in 12 European languages.
University of Waterloo
Efficient programming in Stata and Mata II: Obtaining non-standard distributions for a cointegration test via simulation
Abstract: At the 2017 meeting, I talked about efficient programming with regards to optimal lag selection for autoregressive distributed lag (ARDL) models as implemented in the community-contributed Stata command ardl (Kripfganz and Schneider 2016). I will expand on last year's presentation by focusing on a second nontrivial computational aspect of ardl: the simulation of critical values for the Pesaran, Shin, and Smith (2001) bounds-testing procedure for a long-run relationship. Up until recently, only a limited set of critical values was available. I will illustrate the programming behind Kripfganz and Schneider's (2018) comprehensive and more precise set of critical values and approximate p-values, which have been made available in Stata as a postestimation feature of ardl. I explain the calculation, storage, and processing of 160 billion simulated F or t-statistics. Topics covered will include pointer variables, LAPACK functions in Mata, using variable transformations in conjunction with Stata's various numeric data types for efficient storage, random number streams, and strategies for using several instances of Stata simultaneously.
Kripfganz, S, and D. C. Schneider. 2016. ardl: Stata module to estimate autoregressive distributed lag models. paper presented at the Stata Conference, Chicago, Il, July 2016.
Kripfganz, S, and D. C. Schneider. 2017. A case study in efficient programming in Stata and Mata: Speeding up the ardl estimation command. Paper presented at the German Stata Users Group Meeting, Berlin, June 2017.
Kripfganz, S, and D. C. Schneider. 2018. Response surface regressions for critical value bounds and approximate p-values in equilibrium correction models. Manuscript, University of Exeter and Max Planck Institute for Demographic Research. Available at http://www.kripfganz.de/research/Kripfganz_Schneider_ec.html.
Pesaran, M. H., Y. Shin, and R. J. Smith. 2001. Bounds testing approaches to the analysis of level relationships. Journal of Applied Econometrics 16: 289–326.
Daniel C. Schneider
Max Planck Institute for Demographic Research
How to use Stata's sem command with small samples? New corrections for the likelihood ratio chi square statistic and fit indices basing on it
Abstract: Traditional fit measues based on noncentral chi-square distribution (RMSEA, TLI, or CFI) tend to overreject acceptable models when the sample size is small (n <g; 100). My ado-file, swain_gof.ado, corrects the likelihood ratio chi-square goodness-of-fit test statistic for structural equation models. This chi-square statistic is asymptotically correct, but it does not behave as expected in small samples or when the model is complex (Herzog, Boomsma, and Reinecke 2007). Particularly in situations where the ratio of sample size to the number of parameters estimated is relatively small, such as 5:1 (Bentler and Chou 1987), the chi-square test will tend to overreject correctly specified models. To obtain a closer approximation to the distribution of the chi-square statistic, Swain (1975) developed a correction. His scaling factor, which converges asymptotically to 1 by increasing sample size, is multiplied with the chi-square statistic. This correction better approximates the noncentral chi-square distribution resulting in more appropriate type 1 reject error rates (see Herzog & Boomsma, 2009; Herzog, et al. 2007). This works reliabale just to a sample size-parameter ratio of 2:1.
My swan_gof.ado calculates the root mean squared error of approximation (RMSEA), the Tucker-Lewis Index (TLI), and comparative fit index (CFI) using the Swain-corrected chi-square values assuming multinormal distribution of the observed indicators. Violating this assumption, it calculates the fit additionally indices using the Sattora-Bentler correction. Therefore, you have to use the vce(sbentler) option of the sem command. My swain_gof.ado can be executed after the sem and estat gof, stats(all) as a postestimation command by simply typing swain_gof. It returns the estimated fit indices and scalars as r containers.
A survey example of Islamophobia will be presented to demonstrate the usefulness
of my swain_gof.ado.
Bentler, P.M., and C.P. Chou. 1987. Practical issues in structural equation modeling. Sociological Methods &aamp; Research 16: 78–117.
Bentler, P.M., and K.H. Yuan. 1999. Structural equation modeling with small samples: Test statistics. Multivariate Behavioral Research 34: 181–197.
Curran, P.J., K.A. Bollen, P. Paxton, J. Kirby, and F.N. Chen. 2002. The noncentral chi-square distribution in misspecified structural equation models: Finite sample results from a Monte Carlo simulation. Multivariate Behavioral Research 37: 1–36.
Herzog, W., and W. Boomsma. 2009. Small-sample robust estimators of noncentrality-based and incremental model fit. Structural Equation Modeling 16: 1–27.
Herzog, W., W. Boomsma, and S. Reinecke. 2007. The model-size effect on traditional and modified tests of covariance structures. Structural Equation Modeling 14: 361–90.
Satorra, A., and P.M. Bentler. 1994. Corrections to test statistics and standard errors in covariance structure analysis. In Latent variables analysis: Applications for developmental research, edited by Alexander Von Eye and Clifford Clogg, 399–419. Newbury Park, CA: Sage, 1994.
Swain, A.J. 1975. Analysis of parametric structures for variance matrices (Doctoral thesis). University of Adelaide, Adelaide.
Making interactive presentations in Stata
Abstract: In this presentation, I will go through the workflow of creating an interactive presentation in Stata (a .smcl presentation) with smclpres based on a small example presentation.
Some talks are primarily on how to do things in Stata, like a lecture on graphs in Stata or a talk at a Stata Users' Group meeting. In those cases, a .smcl presentation can be useful. A .smcl presentation is a series of linked .smcl files that open in the viewer inside Stata (like help files). The strength of a .smcl presentation is that it can contain links that execute examples, open help files, open do-files, etc.
A .smcl presentation is all about illustrating how to do something in Stata, so preparing for such a talk typically starts with preparing a set of examples in a do-file. By adding specific comments to that do-file, for example, to indicate when a slide starts and when it ends, what the title of the slide is, etc., the smclpres command can turn that do-file into a .smcl presentation. Moreover, the pres2html command can turn that .smcl presentation into an HTML handout so that participants can easily access the content after the presentation.
University of Konstanz
Efficient construction of good tests using Stata
Abstract: The autoexam ado package allows one to automatically generate multiple-choice tests from a database of items. The tests are optimized with regard to the distribution of difficulties and the representative coverage of course topics. The tests can be written as LaTeX or HTML files. Accompanying ado-files help to analyze items using IRT models and to manage or update the item database. The system can also be used to generate mock exams to allow students to prepare for the exam. When creating such mock exams, the user can choose what percentage, if any, of the real test questions is allowed to occur in the mock exams. Finally, autoexam allows one to include mathematical or statistical questions in the item database that are randomly generated with respect to the specific numbers in the questions. The autoexam ado-package aims to help teachers with creating and correcting exams more efficiently and with better quality. It is particularly helpful for large basic courses that are repeated in regular intervals.
Efficient dynamic documents using Stata
Abstract: Stata 15 includes three new commands for producing dynamic documents: dyndoc, putdocx, and putpdf. These commands have generated much interest in the user community; this has led to a large amount of community-contributed software. In this talk, I'll give some tips about how to use the commands efficiently both with official Stata software and with some of these community-contributed tools.
Wishes and grumbles
Workshops: Thursday, 21 June
Graphics with Stata
Maarten Buis, Universität Konstanz, 9:00 a.m. to 1:00 p.m.
This workshop is intended for participants who want to make the most out of graphs in Stata. Stata has very powerful graphics language, but with power comes an elaborate syntax with a lot of options. This makes it easy to get lost and overlook useful possibilities. In this workshop we will focus on building your graph step by step, and tips and tricks to create a wide range of informative graphs.
Basic knowledge of Stata.
Bayesian analysis using Stata
Yulia Marchenko, Executive Director of Statistics, StataCorp, 2:00 p.m. to 6:00 p.m.
This workshop covers the use of Stata to perform Bayesian analysis. Bayesian analysis is a statistical paradigm that answers research questions about unknown parameters using probability statements. For example, what is the probability that a person accused of a crime is guilty? What is the probability that the odds ratio is between 0.3 and 0.5? And many more. Such probabilistic statements are natural to Bayesian analysis because of the underlying assumption that all parameters are random quantities. In Bayesian analysis, a parameter is summarized by an entire distribution of values instead of one fixed value as in classical frequentist analysis. Estimating this distribution, a posterior distribution of a parameter of interest, is at the heart of Bayesian analysis. This workshop will demonstrate the use of Bayesian analysis in various applications and will introduce Stata's suite of commands for conducting Bayesian analysis.
Basic knowledge of Stata.