Henrik Støvring

Department of Public Health–Department of Biostatistics, Aarhus University

Thought experiments based on simulation can be used to explain the impact of
the chosen study design, statistical analysis strategy, or the sensitivity of
results to fellow researchers. In this talk, I will present two examples
showing how quantitative thought experiments may be implemented in Stata. The
first example uses a large-sample approach to study the impact on the estimated
effect size of dichotomizing an exposure variable at different values. The
second example uses simulations of realistic-size datasets to illustrate the
necessity of using sampling fractions as inverse probability weights in the
statistical analysis for protection against bias in a complex sampling design.
I will also briefly outline the general steps needed for implementing
quantitative thought experiments in Stata. The main purpose is to highlight
that Stata provides programming facilities for conveniently implementing such
thought experiments, and exploiting those may save researchers precious time,
futile speculation, and disruptive debates and thus improve communication in
interdisciplinary research groups.

**Additional information**

dk14_stovring.pptx

dk14_stovring.pptx

Modesto Escobar

Department of Sociology and Communication, Universidad de Salamanca

The aim of this paper is to introduce a new framework to study data structures
that is based on a combination of statistical and social network analysis and
that is called coincidence analysis. The purpose of this procedure is to
ascertain the most frequent events in a given set of scenarios and to study the
relationships between them. In accordance with this procedure, the concurrence
of persons, objects, attributes, characteristics, or events within the same
temporally or spatially delineated set can be classified in the following
manner:

(a) as simple, if both occur at least once in the same set;

(b) as likely if there is more than a single coincidence and if it is more probable than a concurrence produced by mere chance; and

(c) as statistically probable.

In cases where samples of events are the subject of analysis, a confidence interval should be established to determine the statistical meaning of the combination of events.

This mode of analysis can be applied to the exploratory analysis of questionnaires, the study of textual networks, the review of the content of databases, and the comparison of different statistical analyses of interdependence. The following techniques can be used for analyzing the same data: multidimensional scaling, principal component analysis, correspondence analysis, biplot representations, agglomeration techniques, and network analysis algorithms.

The statistical bases of this analysis are described, as is the Stata program that performs the analyses. As an example of its use, the photograph albums of the following people who were famous in the early twentieth century are analyzed: Miguel de Unamuno (1864–1936), Rafael Masó (1880–1935), Joaquín Turina (1882–1949), and Antonia Mercé (1890–1936), stage name la Argentina.

**Additional information**

dk14_escobar2.pdf

(b) as likely if there is more than a single coincidence and if it is more probable than a concurrence produced by mere chance; and

(c) as statistically probable.

In cases where samples of events are the subject of analysis, a confidence interval should be established to determine the statistical meaning of the combination of events.

This mode of analysis can be applied to the exploratory analysis of questionnaires, the study of textual networks, the review of the content of databases, and the comparison of different statistical analyses of interdependence. The following techniques can be used for analyzing the same data: multidimensional scaling, principal component analysis, correspondence analysis, biplot representations, agglomeration techniques, and network analysis algorithms.

The statistical bases of this analysis are described, as is the Stata program that performs the analyses. As an example of its use, the photograph albums of the following people who were famous in the early twentieth century are analyzed: Miguel de Unamuno (1864–1936), Rafael Masó (1880–1935), Joaquín Turina (1882–1949), and Antonia Mercé (1890–1936), stage name la Argentina.

dk14_escobar2.pdf

Thomas Grund and Peter Hedström

Institute of Analytical Sociology, Linköping University

Social network analyses investigate the relationships (arcs/edges) between
individuals or organizations, such as friendship, advice, or trust. In contrast
to many other statistical approaches, one models the interdependencies between
entities explicitly. Such a perspective allows the visualization and study of
structural features of network structures such as centrality of network nodes.
This talk introduces the **nwcommands**—a software suite of over 40
Stata commands—for social network analyses in Stata. The software
includes programs for importing and exporting, loading and saving, handling,
manipulating and replacing, generating, and visualizing and animating networks.
It also includes commands for measuring the importance of network nodes, the
detection of network patterns and features, the similarity of multiple networks,
node attributes, and the advanced statistical analysis of networks
(**nwqap**, **nwergm**). This presentation gives several examples using
these programs, provides instructions for the installation, use, and support
of the software
(http://www.nwcommands.org), and
introduces a platform for developers for additional programs to perform social
network analyses using Stata.

**Additional information**

dk14_grund.pdf

dk14_grund.pdf

Bill Gould

StataCorp

Researchers do not adequately appreciate that floating-point (FP) numbers are a
simulation of real numbers and that, as with all simulations, some features are
preserved and others are not. Writing code, or even do-files, and treating the
computer's floating numbers as if they were real numbers can lead to
substantive problems and to numerical inaccuracy. In this, the relationship
between computers and real numbers is not entirely unlike the relationship
between tea and Douglas Adams's Nutrimatic drink dispenser. The Nutrimatic
produces a concoction that is "almost, but not quite, entirely unlike tea".

In this presentation, I will show what the universe would be like if it were implemented in FP rather than real numbers. The FP universe turns out to be nothing like the real universe and probably could not be made to function. The point of the talk is to build your intuition about the floating-point world so that you as a researcher can predict when calculations might go awry, know how to think about the problem, and determine how to fix it.

**Additional information**

dk14_gould.pdf

In this presentation, I will show what the universe would be like if it were implemented in FP rather than real numbers. The FP universe turns out to be nothing like the real universe and probably could not be made to function. The point of the talk is to build your intuition about the floating-point world so that you as a researcher can predict when calculations might go awry, know how to think about the problem, and determine how to fix it.

dk14_gould.pdf

Kristian Karlson

Department of Sociology, University of Copenhagen

Mediation analyses and their ensuing effect decompositions are widespread in
the social sciences. For example, in stratification research, researchers may
be interested in gauging the extent to which the black-white gap in earnings
can be attributed to the unequal distribution of schooling among the races.
However, methodological research shows that such mediation analyses often fail
to control for the potential endogeneity of the mediator. In the example,
academic ability may be a confounder of the education-earnings association. Yet
controlling for such confounders to eliminate the endogeneity bias of the
mediator is not as straightforward as it may appear. Whenever these control
variables are a function of the predictor variable of interest (race in the
example), standard regression methods for the calculation of direct and
indirect effects no longer apply. Put differently, standard methods cannot
control for post treatment confounders.

In this presentation, I show how to tweak the Stata command**khb**
(implementing the decomposition method developed by Karlson, Holm, and Breen
[*2012, Sociological Methodology* 42:274-301]) to control for these
confounders in the estimation of direct and indirect effects in regression
models using **logit** or **probit**. Under the assumption of linearity,
I exploit the residualization or orthogonalization approach that underlies
**khb** to derive the bias of omitted post treatment confounders, and I
show how to control for them by tweaking the use of **khb**. I also discuss
how to obtain standard errors of the effects. To illustrate the approach, I
give an example of the role of education in social mobility.

**Additional information**

dk14_karlson.pptx

In this presentation, I show how to tweak the Stata command

dk14_karlson.pptx

Jakob Hjort

Department of Cardiology, Aarhus University Hospital

Conceptually, Stata is commendably simple; dealing with only one rectangular
data-grid at a time (variables column-wise and observations row-wise). Within
this simple concept, statistics are (usually) operations performed on the
vertical axis, that is; column-wise, e.g. when obtaining the mean value of age
in a number of subjects/observations. Data management (besides loading-,
appending-, merging data, etc.) is the discipline of preparing the rectangular
data-grid for the statistics e.g. by creating derived variables; that is,
working row-wise (or sideways) in the data-grid. Mainly, derived variables are
recodings or simple calculations based on existing variables - all nicely
supported by easily used build-in stand alone Stata commands/functions.
Sometimes however, when a mix of conditions and calculations are required in
the creation of derived variables, things tend to get slightly more complicated
and may require customized “loops” to be able to traverse and handle selected
variables individually row-wise. Various aspects of working sideways in the
Stata data-grid will be presented and discussed with a strict focus on
transparent, safe and robust data-handling.

**Additional information**

dk14_hjort.ppt

dk14_hjort.ppt

Svend Juul

Department of Public Health, Aarhus University

A PhD student is studying health problems among children born to mothers with
type 1 diabetes. In a clinical database, the student identified 1,300 such
children (index children), and Statistics Denmark delivered information
concerning 100 control children per index child, matched by gender and date of
birth. Health outcomes are mortality, hospital admissions (by diagnosis), and
medications (by ATC groups).

We used a mixed-effects negative binomial regression (Stata's**menbreg**
command) to analyze hospital admissions. **menbreg** is computationally
intensive, and we wanted some 200 analyses (5 age groups, 20 diagnostic
groups, etc.). Some analyses would take several hours. I tried to find out if
there was a way to automatically stop an analysis that took too long and
proceed with the next analysis. Some of the SUG participants will know how to
do that, but I didn't know at the time.

I sent the question to Statalist, and within five minutes, I had two good answers: Use the**iterate()** option. See **help maximize**. It works,
and the analyses are proceeding.

**Additional information**

dk14_juul.pdf

We used a mixed-effects negative binomial regression (Stata's

I sent the question to Statalist, and within five minutes, I had two good answers: Use the

dk14_juul.pdf

Bill Rising

StataCorp

Writing a document that contains statistical results in its narrative,
including inline results, can take too much effort. Typically, users have a
separate series of do-files whose results must then be pulled into the
document. This is a very high-maintenance way to work in because updates to
the data, changes to the do-files, updates to the statistical software, and,
especially, updates to inline results all require work and careful checking of
results.

Reproducible research greatly lessens document-maintenance chores by putting code and results directly into the document; this means that only one document is used; thus it remains consistent and is easily maintained.

In this presentation, I will show you how to put Stata code directly into a LaTeX or HTML document and run it through a preprocessor to create the document containing results. While this is useful for creating self-contained documents, it is very useful for creating periodic reports, class notes, solution sets, and other documents that get used over a long period of time.

**Additional information**

dk14_rising.pdf

dk14_rising_examples.zip

Reproducible research greatly lessens document-maintenance chores by putting code and results directly into the document; this means that only one document is used; thus it remains consistent and is easily maintained.

In this presentation, I will show you how to put Stata code directly into a LaTeX or HTML document and run it through a preprocessor to create the document containing results. While this is useful for creating self-contained documents, it is very useful for creating periodic reports, class notes, solution sets, and other documents that get used over a long period of time.

dk14_rising.pdf

dk14_rising_examples.zip