The first French Stata Users Group meeting was Thursday, 6 July 2017 but you can view the program and presentation slides below.
sdmxuse: Module to import data from statistical agencies using the SDMX standard
Abstract: SDMX, which stands for Statistical Data and Metadata eXchange, is an ISO standard developed by seven international organizations (BIS, ECB, Eurostat, IMF, OECD, the United Nations, and the World Bank) to facilitate the exchange of statistical data 1. The package sdmxuse allows Stata users to download and import SDMX data directly within their favorite software. The program builds and sends a query to the statistical agency (using RESTful web services), then imports and formats the downloaded dataset in XML format. The complex structure of the datasets (so-called cube) is reviewed to show how users can send specific queries and import only the required time series. sdmxuse might prove useful for researchers who need frequently updated time series and wish to automate the downloading and formatting process.
Download sdmxuse from SSC
Université Catholique de Louvain
Abadie's semiparametric difference-in-differences estimator
Abstract: The difference-in-differences estimator measures the effect of a treatment or policy intervention by comparing change over time of the outcome variable across treatment groups. To interpret the estimate as a causal effect, this strategy requires that, in the absence of the treatment, the outcome variable followed the same trend in treated and untreated groups. This assumption may be implausible if selection for treatment is correlated with characteristics that affect the dynamic of the outcome variable. In this presentation, I describe the command absdid, which implements the semiparametric difference-in-differences (SDID) estimator of Abadie (2005, Review of Economic Studies 72: 1-19). The SDID is a reweighing technique that addresses the imbalance of characteristics between treated and untreated groups. Hence, it makes the parallel trend assumption more credible. In addition, the SDID estimator allows the use of covariates to describe how the average effect of the treatment varies for different groups of the treated population.
Download difference-in-differences estimator from SSC
Paris School of Economics
validscale: A Stata module to validate subjective measurement scales
Abstract: Subjective measurement scales consist of questionnaires aiming at measuring non-observable respondent characteristics, such as quality of life, pain, or intelligence. The questionnaires can be unidimensional (they measure one concept) or multidimensional (they measure several concepts), so they can lead to one or several scores supposedly measuring the concepts of interest. In classical test theory (CTT), the scores are a combination (sum or mean) of responses to one or several items. To be useful, a questionnaire must provide psychometric properties showing that the instrument correctly measures what it intends to measure. The two main properties we want to assess are validity and reliability. Validity and reliability are assessed by checking their respective facets: content validity, construct validity, and criterion validity for validity; internal consistency, test-retest reliability, and scalability for reliability. Most of these properties can be assessed using statistical analyses (factor analysis, intraclass correlation coefficients, etc.). However, there is currently no statistical software package to easily perform all of these tests. We developed validscale, a Stata module that performs the recommended analyses to validate a subjective measurement scale using CTT. A dialog box was also developed to use the module in a user-friendly manner.
Université de Nantes, Université de Tours, INSERM, SPHERE U1246
Introduction to Bayesian analysis using Stata
Abstract: Bayesian analysis has become a popular tool for many statistical applications. Yet many statisticians have little training in the theory of Bayesian analysis and software used to fit Bayesian models. This talk will provide an intuitive introduction to the concepts of Bayesian analysis and demonstrate how to fit Bayesian models using Stata. No prior knowledge of Bayesian analysis is necessary and specific topics will include the relationship between likelihood functions, prior and posterior distributions, Markov Chain Monte Carlo (MCMC) using the Metropolis–Hastings algorithm, and how to use Stata's graphical user interface and command syntax to fit Bayesian models.
repest, an ado-file dedicated to the international skills surveys produced by the OECD such as PISA, PIAAC, and TALIS
Abstract: These surveys have complex sampling designs and use multiply imputed variables. These two characteristics need to be taken into account to obtain correct standard errors, but they are often forgotten by users because of the complexity of this design. repest has been conceived to easily incorporate them into any eclass stata command. repest also includes a set of tools to facilitate the exploitation of international surveys. If you want to have a proper look at this work, the ado-file is available on the IDEAS website, along with a detailed help file. Please note that repest is also compatible with other surveys such as TIMMS or IALS.
Download the repest ado-file from SSC
mixmcm: a Stata command for estimating mixtures of Markov chain models using ML and the EM algorithm
Abstract: Markov chain and mixture models have been widely applied in various strands of the academic literature. Several studies have combined both modeling approaches to account for unobserved heterogeneity within a population when analyzing dynamic processes. For instance, a restricted form of this combined approach, the so-called mover-stayer model (MSM), has been used to investigate agents mobility in sociology, economics, or medical sciences. This paper describes mixmcm, a user-written Stata command that allows estimating the general class of mixed Markov chain models (MMCM). To account for the possibility of incomplete information within the data, the model is estimated with maximum likelihood (ML) using the expectation-maximization (EM) algorithm. The proposed command enables users to estimate the MMCM parametrically, semiparametrically, or nonparametrically, depending on the chosen specifications for the transition probabilities and the mixing distribution. The MSM is obtained from this general setting by imposing relevant restrictions on the transition probability matrices. Dealing with the general model, mixmcm also enables one to endogenously identify the optimal number of homogeneous chains. A postestimation command is also provided for further inspection and analysis of results. The usefulness of the proposed command is illustrated with an application in the field of agricultural economics to analyze farm-size dynamics.
SMART, Agrocampus Ouest, INRA
Cluster analysis utilities for Stata
Abstract: Stata has a set of built-in commands for cluster analysis. While they are solid and effective, they have some limitations. I present several utilities that extend Stata's cluster analysis capability, particularly, but not exclusively, when working from matrices of pairwise distances rather than variables (that is, when using clustermat rather than cluster).
permtab and ari compare cluster solutions, respectively, by permuting categories to maximize Cohen's kappa and calculating the adjusted Rand Index.
calinski and dudahart implement the Calinski–Harabasz and Duda–Hart stopping rules for clustering from pairwise distance matrices (official Stata calculates these for clustering from variables only). Calculating these indices from the distance matrix means they can be applied to other measures than squared Euclidean distance, even when clustering from variables. Studer's discrepancy measure is closely related and links these measures to ANOVA-like procedures on distance matrices.
silhouette and dendrohmap provide graphical summaries, respectively, the silhouette plot (which captures the relative distinctness of clusters) and a heat map representation of the pairwise distances (ordered by the cluster dendrogram).
I also present a command, pam, to do partitioning around medoids using distance matrices (similar to cluster kmedians when working from variables).
University of Limerick
Quantile plots: New planks in an old campaign
Abstract: Quantile plots show ordered values (raw data, estimates, residuals, etc.) against rank or cumulative probability or a one-to-one function of the same. Even in a strict sense, they are almost 200 years old. In Stata, quantile, qqplot, and qnorm go back to 1985 and 1986. So why any fuss?
The presentation is built on a long-considered view that quantile plots are the best single plot for univariate distributions. No other kind of plot shows so many features so well across a range of sample sizes with so few arbitrary decisions. Both official and user-written programs appear in a review that includes side-by-side and superimposed comparisons of quantiles for different groups and comparable variables. Emphasis is on newer work, with focus on the compatibility of quantiles with transformations; fitting and testing of brand-name distributions; quantile-box plots as proposed by Emanuel Parzen (1929–2016); equivalents for ordinal categorical data; and the question of which graphics best support paired and two-sample t and other tests.
Commands mentioned include distplot, multqplot, and qplot (Stata Journal) and mylabels, stripplot, hdquantile, and lvalues (SSC).
Nicholas J. Cox
Nonlinear mixed-effects regression
Abstract: In many applications, such as biological and agricultural growth processes and pharmacokinetics, the time course of a continuous response for a subject over time may be characterized by a nonlinear function. Parameters in these subject-specific nonlinear functions often have natural physical interpretations, and observations within the same subject are correlated. Subjects may be nested within higher-level groups, giving rise to nonlinear multilevel models, also known as nonlinear mixed-effects or hierarchical models. The new Stata 15 command menl allows you to fit nonlinear mixed-effects models, in which fixed and random effects may enter the model nonlinearly at different levels of hierarchy. In this talk, I will show you how to fit nonlinear mixed-effects models that contain random intercepts and slopes at different grouping levels with different covariance structures for both the random effects and within-subject errors. I will also discuss parameter interpretation and highlight postestimation capabilities.
Wishes and grumbles
The logistics organizer for the 2017 French Stata Users Group meeting is
the distributor of Stata in Belgium, France, and Switzerland.
View the proceedings of previous Stata Users Group meetings.