Home  /  Stata Conferences and Users Group meetings  /  2014 Nordic and Baltic

Last updated: 8 September 2014

2014 Nordic and Baltic Stata Users Group meeting

5 September 2014

Castle Kalo

Department of Political Science, Aarhus University
Bartholins Allé 7
DK-8000 Aarhus C


Stata as a numerical tool for scientific thought experiments: A tutorial with worked examples

Henrik Støvring
Department of Public Health–Department of Biostatistics, Aarhus University
Thought experiments based on simulation can be used to explain the impact of the chosen study design, statistical analysis strategy, or the sensitivity of results to fellow researchers. In this talk, I will present two examples showing how quantitative thought experiments may be implemented in Stata. The first example uses a large-sample approach to study the impact on the estimated effect size of dichotomizing an exposure variable at different values. The second example uses simulations of realistic-size datasets to illustrate the necessity of using sampling fractions as inverse probability weights in the statistical analysis for protection against bias in a complex sampling design. I will also briefly outline the general steps needed for implementing quantitative thought experiments in Stata. The main purpose is to highlight that Stata provides programming facilities for conveniently implementing such thought experiments, and exploiting those may save researchers precious time, futile speculation, and disruptive debates and thus improve communication in interdisciplinary research groups.

Additional information

Studying coincidences with network analysis and other statistical tools

Modesto Escobar
Department of Sociology and Communication, Universidad de Salamanca
The aim of this paper is to introduce a new framework to study data structures that is based on a combination of statistical and social network analysis and that is called coincidence analysis. The purpose of this procedure is to ascertain the most frequent events in a given set of scenarios and to study the relationships between them. In accordance with this procedure, the concurrence of persons, objects, attributes, characteristics, or events within the same temporally or spatially delineated set can be classified in the following manner:

(a) as simple, if both occur at least once in the same set;
(b) as likely if there is more than a single coincidence and if it is more probable than a concurrence produced by mere chance; and
(c) as statistically probable.

In cases where samples of events are the subject of analysis, a confidence interval should be established to determine the statistical meaning of the combination of events.
This mode of analysis can be applied to the exploratory analysis of questionnaires, the study of textual networks, the review of the content of databases, and the comparison of different statistical analyses of interdependence. The following techniques can be used for analyzing the same data: multidimensional scaling, principal component analysis, correspondence analysis, biplot representations, agglomeration techniques, and network analysis algorithms.

The statistical bases of this analysis are described, as is the Stata program that performs the analyses. As an example of its use, the photograph albums of the following people who were famous in the early twentieth century are analyzed: Miguel de Unamuno (1864–1936), Rafael Masó (1880–1935), Joaquín Turina (1882–1949), and Antonia Mercé (1890–1936), stage name la Argentina.

Additional information

Social network analysis using Stata

Thomas Grund and Peter Hedström
Institute of Analytical Sociology, Linköping University
Social network analyses investigate the relationships (arcs/edges) between individuals or organizations, such as friendship, advice, or trust. In contrast to many other statistical approaches, one models the interdependencies between entities explicitly. Such a perspective allows the visualization and study of structural features of network structures such as centrality of network nodes. This talk introduces the nwcommands—a software suite of over 40 Stata commands—for social network analyses in Stata. The software includes programs for importing and exporting, loading and saving, handling, manipulating and replacing, generating, and visualizing and animating networks. It also includes commands for measuring the importance of network nodes, the detection of network patterns and features, the similarity of multiple networks, node attributes, and the advanced statistical analysis of networks (nwqap, nwergm). This presentation gives several examples using these programs, provides instructions for the installation, use, and support of the software (http://www.nwcommands.org), and introduces a platform for developers for additional programs to perform social network analyses using Stata.

Additional information

Floating point numbers: A visit through the looking glass

Bill Gould
Researchers do not adequately appreciate that floating-point (FP) numbers are a simulation of real numbers and that, as with all simulations, some features are preserved and others are not. Writing code, or even do-files, and treating the computer's floating numbers as if they were real numbers can lead to substantive problems and to numerical inaccuracy. In this, the relationship between computers and real numbers is not entirely unlike the relationship between tea and Douglas Adams's Nutrimatic drink dispenser. The Nutrimatic produces a concoction that is "almost, but not quite, entirely unlike tea".

In this presentation, I will show what the universe would be like if it were implemented in FP rather than real numbers. The FP universe turns out to be nothing like the real universe and probably could not be made to function. The point of the talk is to build your intuition about the floating-point world so that you as a researcher can predict when calculations might go awry, know how to think about the problem, and determine how to fix it.

Additional information

Tweaking -khb- to control for post treatment confounders in mediation analysis

Kristian Karlson
Department of Sociology, University of Copenhagen
Mediation analyses and their ensuing effect decompositions are widespread in the social sciences. For example, in stratification research, researchers may be interested in gauging the extent to which the black-white gap in earnings can be attributed to the unequal distribution of schooling among the races. However, methodological research shows that such mediation analyses often fail to control for the potential endogeneity of the mediator. In the example, academic ability may be a confounder of the education-earnings association. Yet controlling for such confounders to eliminate the endogeneity bias of the mediator is not as straightforward as it may appear. Whenever these control variables are a function of the predictor variable of interest (race in the example), standard regression methods for the calculation of direct and indirect effects no longer apply. Put differently, standard methods cannot control for post treatment confounders.

In this presentation, I show how to tweak the Stata command khb (implementing the decomposition method developed by Karlson, Holm, and Breen [2012, Sociological Methodology 42:274-301]) to control for these confounders in the estimation of direct and indirect effects in regression models using logit or probit. Under the assumption of linearity, I exploit the residualization or orthogonalization approach that underlies khb to derive the bias of omitted post treatment confounders, and I show how to control for them by tweaking the use of khb. I also discuss how to obtain standard errors of the effects. To illustrate the approach, I give an example of the role of education in social mobility.

Additional information

Working sideways in Stata

Jakob Hjort
Department of Cardiology, Aarhus University Hospital
Conceptually, Stata is commendably simple; dealing with only one rectangular data-grid at a time (variables column-wise and observations row-wise). Within this simple concept, statistics are (usually) operations performed on the vertical axis, that is; column-wise, e.g. when obtaining the mean value of age in a number of subjects/observations. Data management (besides loading-, appending-, merging data, etc.) is the discipline of preparing the rectangular data-grid for the statistics e.g. by creating derived variables; that is, working row-wise (or sideways) in the data-grid. Mainly, derived variables are recodings or simple calculations based on existing variables - all nicely supported by easily used build-in stand alone Stata commands/functions. Sometimes however, when a mix of conditions and calculations are required in the creation of derived variables, things tend to get slightly more complicated and may require customized “loops” to be able to traverse and handle selected variables individually row-wise. Various aspects of working sideways in the Stata data-grid will be presented and discussed with a strict focus on transparent, safe and robust data-handling.

Additional information

A short story about Danish register research and Statalist

Svend Juul
Department of Public Health, Aarhus University
A PhD student is studying health problems among children born to mothers with type 1 diabetes. In a clinical database, the student identified 1,300 such children (index children), and Statistics Denmark delivered information concerning 100 control children per index child, matched by gender and date of birth. Health outcomes are mortality, hospital admissions (by diagnosis), and medications (by ATC groups).

We used a mixed-effects negative binomial regression (Stata's menbreg command) to analyze hospital admissions. menbreg is computationally intensive, and we wanted some 200 analyses (5 age groups, 20 diagnostic groups, etc.). Some analyses would take several hours. I tried to find out if there was a way to automatically stop an analysis that took too long and proceed with the next analysis. Some of the SUG participants will know how to do that, but I didn't know at the time.

I sent the question to Statalist, and within five minutes, I had two good answers: Use the iterate() option. See help maximize. It works, and the analyses are proceeding.

Additional information

Reproducible research in Stata

Bill Rising
Writing a document that contains statistical results in its narrative, including inline results, can take too much effort. Typically, users have a separate series of do-files whose results must then be pulled into the document. This is a very high-maintenance way to work in because updates to the data, changes to the do-files, updates to the statistical software, and, especially, updates to inline results all require work and careful checking of results.

Reproducible research greatly lessens document-maintenance chores by putting code and results directly into the document; this means that only one document is used; thus it remains consistent and is easily maintained.

In this presentation, I will show you how to put Stata code directly into a LaTeX or HTML document and run it through a preprocessor to create the document containing results. While this is useful for creating self-contained documents, it is very useful for creating periodic reports, class notes, solution sets, and other documents that get used over a long period of time.

Additional information

Scientific organizers

Peter Hedström, Institute for Futures Studies

Kim Mannemar Sønderskovskov, Aarhus University

Svend Juul, Aarhus University

Logistics organizers

Metrika Consulting, the official distributor of Stata in the Nordic and Baltic regions, and the Karolinska Institutet.