Home  /  Support  /  Users Group meetings  /  2014 Italy

Last updated: 20 December 2014

2014 Italian Stata Users Group meeting

13 November 2014


NH Milano Touring Hotel
Via Ugo Tarchetti, 2
20121 Milano


margins in Stata

Enrique Pinzón

The margins command in Stata allows us to get a wide array of results using coefficient estimates. I will illustrate the use of margins in some commonly used models. I will then illustrate a new result. I will show how we can use margins after fixed-effects panel-data estimation that incorporates the effect of the unobserved time-invariant component to obtain average marginal effects and average treatment effects.

Additional materials:

datanet: A Stata procedure to facilitate dataset organization for network analysis

Giovanni Cerulli
CNR-CERIS National Research Council of Italy, Unit of Rome
Antonio Zinilli
CNR-CERIS National Research Council of Italy, Unit of Rome

In recent years, much interest has focused on network analysis. This study presents and applies to real data a new user-written Stata command called datanet, which facilitates the dataset organization for the purpose of network analysis. Given a fixed number of units (or nodes) belonging to the same group (there will be a variable denoting group membership) and possibly connected to each other, this routine creates a new dataset containing all their possible couplings, which can be easily exploited with Stata network analysis’ commands. To our knowledge, no routine has been developed in Stata so far that executes this type of procedure. Moreover, this presentation will also review how to perform some basic network analysis in Stata and discuss further network analysis’ applications we deem of worth developing in Stata in the near future.

Additional materials:

Approximate Bayesian logistic regression via penalized likelihood estimation with data augmentation

Andrea Discacciati
Karolinska Institute, Stockholm

Data augmentation is a technique for conducting approximate Bayesian regression analysis. This technique is a form of penalized likelihood estimation where prior information, represented by one or more specific prior data records, generates a penalty function that imposes the desired priors on the regression coefficients. We present a new command, penlogit, that fits penalized logistic regression via data augmentation. We illustrate the command through an example using data from an epidemiological study.

Additional materials:

Efficient and effective management of big databases in Stata

Giovanni Marin

This presentation will manage the description of methods of handling large datasets in Stata effectively and efficiently. The course will be based on general considerations of issues related to the management of large datasets on practical examples of specific datasets. We will cover the following topics:

  • Basic information about the sizes of variables in Stata
  • How to import datasets in different formats from .dta
  • Reshaping of datasets in Stata
  • Basics of relational databases
  • Methods for combining multiple datasets
  • Suggestions to optimize the storage of large datasets

Additional materials:

sftfe: A Stata command for fixed-effects stochastic frontier models estimation

Federico Belotti
CEIS University of Rome Tor Vergata
Giuseppe Ilardi
Bank of Italy

The classical stochastic frontier panel-data models provide no mechanism to disentangle individual time-invariant unobserved heterogeneity from inefficiency. Greene (2005a,b) proposed the so-called true fixed-effects specification that distinguishes these two latent components and allows for time-varying inefficiency. However, because of the incidental parameters problem, the maximum likelihood estimator proposed by Greene leads to biased variance estimates in short panels. sftfe allows the estimation of this model via three alternative estimators (Belotti and Ilardi 2012; Chen et al. 2014), which by relying on data transformation, achieve consistency for n ! 1 with fixed T. Of special note is that sftfe allows the underlying mean and variance of the inefficiency to be expressed as functions of exogenous covariates. Furthermore, the new command allows the estimation of a "true" fixed-effects model in which the inefficiency is assumed to follow a first-order autoregressive process. These features can be considered relevant from the methodological point of view because both model parameters and inefficiency estimates may be adversely affected when inefficiency heterogeneity, heteroskedasticity, and serial correlation are neglected. They are also important empirically because they allow for testing specific hypotheses of interest and policy implications and avoid biased two-step procedures.

Additional materials:

ntreatreg: A Stata module for estimation of treatment effects in the presence of neighborhood interactions

Giovanni Cerulli
CNR-CERIS National Research Council of Italy, Institute for Economic Research on Firms and Growth

This presentation presents a parametric counterfactual model identifying average treatment effects (ATEs) by conditional mean independence when externality (or neighborhood) effects are incorporated within the traditional Rubin potential-outcome model. As such, it tries to generalize the usual control-function regression, widely used in program evaluation and epidemiology, when the stable unit treatment value assumption (SUTVA) is relaxed. As a by-product, the paper also presents ntreatreg, a user-written Stata command for estimating ATEs when social interaction may be present. Finally, an instructional application of the model and of its Stata implementation (using ntreatreg) through two examples (the first on the effect of housing location on crime; the second on the effect of education on fertility) is shown and results compared with a no-interaction setting.

Additional materials:

Dynamic documents in Stata: MarkDoc, Ketchup, and Weaver

E. F. Haghish
University of Freiburg, Germany

For Stata users who know LaTeX, writing a document that includes text, graphs, and Stata syntax and output has been a tedious and unreproducible manual process. To ease the process of creating dynamic documents in Stata, many Stata users have wished to see two additional features in Stata: literate programming and combining graphs with logfiles in a single document. MarkDoc, Ketchup, and Weaver are three user-written Stata packages that allow you to create a dynamic document that includes graphs, text, and Stata codes and outputs and export it in a variety of file formats, including PDF, Docx, HTML, LaTex, OpenOffice/LibreOffice, EPUB, etc. I will also discuss further details about the specialties of these packages and their potential applications.

Additional materials:

The A to Z of how to create thematic maps of Italy using spmap

Maurizio Pisati
University of Milano–Bicocca

The purpose of this presentation is to present a step-by-step tutorial on how to draw thematic maps of Italy using the Stata user-written command spmap and spatially referenced data freely available on the Internet.

Additional materials:

Reproducible research in Stata

Bill Rising

Writing a document that contains statistical results in its narrative, including inline results, can take too much effort. Typically, users have a separate series of do-files whose results must then be pulled into the document. Reproducible research greatly lessens document-maintenance chores by putting code and results directly into the document; this means that only one document is used; thus it remains consistent and is easily maintained. This session illustrates how to place Stata code directly into a LaTeX or HTML document and run it through a preprocessor to create the document containing results.

Additional materials:

Average partial effects in multivariate probit models with latent heterogeneity: Monte Carlo experiments and an application to immigrants' ethnic identity and economic performance

Giovanni Bruno
University Commercial Luigi Bocconi, Milan and Orietta Dessy, Ca'Foscari University of Venice

We extend the univariate results in [Wooldridge (2005)] to multivariate probit models, proving the following. 1) Average partial effects (APEs) based on joint probabilities are consistently estimated by conventional multivariate probit models under general forms of conditionally independent latent heterogeneity (LH) as long as the only constraints beyond normalization, if any, are within-equation homogenous restrictions. The normalization of choice is not neutral to consistency in models with cross-equation parameter restrictions beyond normalization, such as those implemented by Stata's asmprobit command or in the panel probit model: if the normalization is through an error covariance matrix in correlation form, consistency breaks down unless the LH components are truly homoskedastic. This is substantial because an error covariance matrix in correlation form is the only normalization permitted by Stata's biprobit and mvprobit commands or Limdep's BIVARIATE PROBIT and MPROBIT. Covariance restrictions beyond normalizations generally conflict with an arbitrary covariance matrix for the LH components. The multinomial probit model with i.i.d. errors, implemented by Stata's mprobit, is a case in point. 2) Conditional independence of the LH components is not generally sufficient for consistent estimation of APEs on conditional probabilities. Consistency is restored by maintaining an additional independence assumption. This holds true whether or not the response variables are used as regressors. 3) The dimensionality benefit observed by [Mullahy (2011)] in the estimation of partial effects extends to APEs. We exploit this feature in the design of a simple procedure estimating APEs, which is both faster and more accurate than simulation-based codes, such as Stata's mvprobit and cmp. To demonstrate the finite-sample implications of our results, we carry out extensive Monte Carlo experiments with bivariate and trivariate probit models. Finally, we apply our procedure in (3) to Italian survey data of immigrants in order to estimate the APEs of a trivariate probit model of ethnic identity formation and economic performance.


Wooldridge, J. M. 2005.
Unobserved heterogeneity and estimation of average partial effects. In Identification And Inference For Econometric Models: Essays In Honor Of Thomas Rothenberg, ed. D. W. K. Andrews, and J. H. Stock. 21– 55. Cambridge: Cambridge University Press.
Mullahy, J. 2011.
Marginal effects in multivariate probit and kindred discrete and count outcome models, with applications in health economics. No. w17588. National Bureau of Economic Research.

Additional materials:

A review of propensity score: Principles, methods, and application in Stata

Alessandra Grotta
Karolinska Institute, Stockholm
Rino Bellocco
University of Milan–Bicocca, Karolinska Institute, Stockholm

This talk introduces the principles of propensity-score theory and reviews available programs to implement propensity-score methods in Stata, with particular focus on psmatch2 and teffects psmatch. An application on real data will be shown.

Additional materials:

Social mobility and mortality in southern Sweden (1815–1910)

Paolo Emilio Cardone
Sapienza University of Rome

The aim of this research project is to see how intra-social group mobility affected mortality patterns in Sweden; the project covers the transition from a preindustrial to an industrial society. According to previous studies (see Bengtsson 2010; Bengtsson and Van Poppel 2011; Bengtsson and Dribe 2011; Dribe, Helgertz, and Van de Putte 2013), social economical status (SES) did not substantially positively affect life expectancy in the Swedish population; rather, other variables, such as public health measures or education, were key factors. However, a new question has emerged for us: Is it possible that other socio-economic factors, such as intergenerational mobility, affected life expectancy? To answer this, we use a dataset between 1815 and 1910 from the Scanian Economic-Demographic Database (SEDD). The database is based on local population registers for five rural Scanian coast parishes (Hög, Kävlinge, Halmstad, Sireköpinge, and Kågeröd). Analysis is based on three periods according to historical criterion (the preindustrial period: 1815–1869; the early industrial period: 1870–1894; and the first part of the breakthrough of industrialization: 1895–1910).

In our study, we define intra-social mobility as the chances of an individual between ages 30 and 49 to experience a change of his SES according to SOCPO codification. SOCPO is composed of a five-category classification scheme. Our main reason for using it is that while it focuses on social power, it is also highly correlated with education and income. In addition, this classification can be used for both rural and industrial societies. Therefore, a Cox proportional hazard model will be applied to estimate the influence of social mobility, controlling for age and other possible determinant variables. We are going to estimate a model for each SOCPO category. This model includes social mobility status (a categorical variable in which 1 is when the individual experiences upward mobility and 0 otherwise), age, sex, year of birth, parish of residence, and position in the household. Thus, after these analyses, we expect to find a significant and positive relationship between social economic mobility and mortality.

Additional materials:

Wishes and grumbles

Bill Rising and Enrique Pinzón

Bill Rising and Enrique Pinzón will be happy to receive wishes for developments in Stata and almost as happy to receive grumbles about the software.

Scientific organizers

Una-Louise Bell, TStat S.r.l.
[email protected]

Rino Bellocco, Karolinska Institutet
[email protected]

Giovanni Capelli, Università degli Studi di Cassino
[email protected]

Marcello Pagano, Harvard School of Public Health
[email protected]

Maurizio Pisati, Università degli Studi di Milano–Bicocca
[email protected]

Logistics organizers

TStat S.r.l, the official distributor of Stata in Italy.