Home  /  Stata Conferences  /  2015 Portuguese Stata Users Group meeting

2015 Portuguese Stata Users Group meeting

18 September 2015


Nova School of Business and Economics
Campus de Campolide
1099-032 Lisboa


Big Data in Stata

Paulo Guimarães
Bank of Portugal
Datasets are becoming increasingly larger, and their use poses new challenges. In this presentation, I draw on my experience with managing and analyzing large datasets and will offer some advice for Stata users. Besides providing some practical tips, I also discuss several recent user-written commands that are particularly suited for dealing with large datasets. Finally, I will also talk about issues regarding estimation of high-dimensional models.

Additional information

Impact of credit ratings in crisis-hit countries: An application with Markov chains

Nicoletta Rosati
University of Lisbon
Vasco Oliveira
University of Lisbon
Credit ratings have been fairly discussed in recent years, primarily because of the possible impacts they have on the economy. After the financial crisis of 2008, and with no autonomy to pursue an expansionary monetary policy, crisis-hit countries such as Portugal and Spain are still struggling to control their public debt and reviving the economy simultaneously while trying to be upgraded in their sovereign credit ratings. In this presentation, we propose a different approach in analysing the impact of changes in sovereign credit ratings on stock markets. We study the evolution of a segmented form of the stock market index for several crisis-hit countries, including both European and Asian markets. Such evolution is initially modeled by a homogeneous Markov chain, where the transition probabilities from one starting level of the index to a new (lower or higher) level in the next period depend on some explanatory variables, which include the country's rating, GDP, and interest rate, through an ordered probit model. We then inspect the model's reaction to changes of credit ratings at different percentiles of their distribution. Finally, we suggest some possible extensions of research and applications.

Additional information

eurouse: A Stata command to import data from the Eurostat bulk facility

David Leite Neves
University of Lisbon
Isabel Porença
University of Lisbon
The Eurostat bulk facility contains about 5,800 datasets from more than 30 European countries. Some datasets also include the United States and Japan. The datasets are reported to Eurostat by the national statistical offices and include monetary and financial statistics, national accounts, labor market statistics, social statistics, etc. Eurostat updates the datasets twice a day. In this presentation, I will present a command that I developed to automatically download and import these datasets into Stata. The user only needs to type the dataset code in the command line, and eurouse will automatically build a panel with the latest records from all the countries that report to Eurostat. The motivating example comes from the need of building a panel dataset for European Union countries and being able to efficiently (1) identify the data, (2) have access to their description and meta-information, and (3) feed the database with the latest updates. The command eurouse does all of these automatically.

Using ODBC with Stata

Rita Sousa
Bank of Portugal
Open DataBase Connectivity (ODBC) is a standardized set of function calls that can be used to access data stored in database management systems. Stata's odbc command allows us to load, write, and view data from ODBC sources. My presentation will be based on general considerations of issues related to the management of large datasets on practical examples using ODBC.

Additional information

Two powerful tools: gsem and margins

Isabel Canette
gsem is a versatile command that fits generalized structural equation models, and it can be used to fit customized models without the need of programming. I will introduce the different aspects of generalized structural equation models: family and link, latent variables, and random effects. These elements can be combined to build complex models that might not otherwise be available as a stand-alone command. Another useful tool is margins, which allows us to compute marginal means and marginal effects, among other statistics. We will discuss how to use these features to interpret a nonlinear model, and we will also discuss a feature introduced in Stata 14, marginal predictions on the random effects for random-effects models.

Additional information

Stata in the everyday life of health economists

Pedro Pita Barros
Nova School of Business & Economics, Universidade Nova de Lisbon
I am a health economist, and my activities with data and Stata cover data management (small and large datasets), simple estimation and graphs and figures production, estimation of standard and nonstandard models, and writing both scientific papers and a blog. I will cover how I use the features of Stata for these activities, highlighting both the commands I find more useful and a wish list for things for which I would like someone to build commands.

Additional information

Lerman: A Stata module to decompose inequality using sampling weights

Bruno Damásio
University of Lisbon
David Leite Neves
University of Lisbon
The Gini index is the most widely used measure of income inequality. Lerman and Yitzhaki (1985) proposed a method to decompose and compute the marginal impact of each income source in the Gini index. Ló-Feldman (2006) presented a Stata module to operationalize Lerman and Yitzhaki's method; however, it does not allow the use of sampling weights, which considerably narrows its application to household surveys. In this presentation, we will present lerman, a user-written command that incorporates sampling weights in the Lerman and Yitzhaki (1985) methodology. To illustrate the usefulness of the command in income inequality studies, we will provide an empirical application to the USA, using data from the Panel Study of Income Dynamics.

Additional information

Technology, skills, and job duration

Hugo Castro Silva
University of Lisbon
Francisco Lima
University of Lisbon
We study technology-skill complementarities in manufacturing and their influence on job duration by analyzing hazard functions for different levels of technology intensity. Using a Portuguese matched employer-employee longitudinal dataset and a robust identification strategy of displaced workers, we estimate discrete-time duration models allowing for unobserved heterogeneity. We find that the accumulation of specific human capital plays a stronger role on reducing the hazard of job separation in more technology-intensive sectors. Also the returns to firm-specific skills and to general human capital increase with technology intensity. Our results suggest that technology-skills complementarity is observable in terms of job duration.

Additional information

Stata in health research: From everyday questions to major studies

Sofia Baptista
Porto University
Stata comes with multiple advantages in comparision with its direct competitors: better oriented for health sciences research and it is a robust and versatile software. Stata is easy to use with the advantage of allowing user-written commands. The price is competitive and the access to documents and help is good. Stata has been used in the major clinical and experimental studies as shown before. However, my point today is that Stata can be a powerful tool for everyday clinical questions, for those doctors who are not researchers but aim to understand statistics to improve their practice and understand tendencies about their patients' diseases and treatments. The truth is that doctors have nowadays, at the distance of a click, the most important thing to start a research: large databases.

Additional information

Using pointers and structures in Stata to estimate panel-data models with attrition

Pierre Hoonhout
University of Lisbon
Panel datasets usually have missing data: some of the units that are approached in the first wave fail to respond in later waves. It is well known that this panel-data attrition leads to unreliable inferences. Hoonhout and Ridder (2016) show that the sequential additively nonignorable (SAN) attrition model nonparametrically just-identifies the population distribution if refreshment samples are available. Hoonhout (2016) proposes a weighted GMM-estimator for this problem. The estimator corrects for the potentially biasing effects of nonignorable attrition. This presentation will focus on the implementation of this estimator in Stata. In particular, it will use this context to highlight the potential benefits of using structures and pointers in Mata.

Additional information

Wishes and grumbles

Bill Rising & Isabel Canette
StataCorp staff will be happy to receive wishes for developments in Stata and almost as happy to receive grumbles about the software.

Scientific organizers

Pedro Pita Barros, Universidade Nova de Lisboa

João Cerejeira, Universidade do Minho

Anabela Carneiro, Universidade do Porto

Miguel Portela, Universidade do Minho

Paulo Guimarães, Bank of Portugal

Pierre Hoonhout, Universidade de Lisbon

Nicoletta Rosati, Universidade de Lisbon

Logistics organizers

Timberlake Consultores, the official distributor of Stata in Portugal.