Home  /  Resources & support  /  User Group meetings  /  2009 Nordic and Baltic Stata Users Group meeting

Last updated: 21 September 2009

2009 Nordic and Baltic Stata Users Group meeting

17–18 September 2009

Karolinska Framsidan

Karolinska Institutet
Department of Medical Epidemiology and Biostatistics
Wargentine Lecture Hall
Nobels väg 12A
SE-171 77 Stockholm


Multiple-imputation analysis using Stata's new mi command

Yulia Marchenko

Multiple imputation is a popular simulation-based method for handling missing data. It replaces missing values with multiple sets of simulated values from an imputation model, applies primary analyses of interest to each imputed dataset, and obtains parameter estimates adjusted for missing-data uncertainty.

Stata 11's mi command for multiple-imputation analysis performs imputation, data management, and estimation. mi impute provides five univariate and two multivariate imputation methods. mi estimate combines the estimation and pooling steps of the multiple-imputation procedure into one easy step. mi also provides an extensive ability to manage multiple-imputed data.

The presentation will cover all aspects of using Stata 11's mi command to perform multiple-imputation analysis from imputation to data management to estimation.

Imputing missing values in a food frequency questionnaire to improve the relation between energy intake and expenditure

Kirsten Mehlig
Gothenburg University

Background: Underreporting is a common problem in dietary surveys and is particularly problematic for the obese. Underreporting in association with obesity may be further exacerbated by the assumption of standard portion sizes and by the assumption that missing data indicates that food is not eaten. Multiple imputation of missing data has been shown to be superior to single imputation assuming zero consumption or other plausible values. Use of portion size pictures may also reduce bias by capturing more individual variation associated with obesity. This study describes how multiple imputation as well as the use of a self-reported generalized portion size measure can improve the agreement between reported energy intake and expenditure and reduce obesity-related bias.

Method: InterGene is a population-based survey in which 1380 men and 1511 women completed a validated food frequency questionnaire (FFQ) with a supplementary 9-level scale describing portion size, based on photographs of a typical meal. Energy intake (EI) calculations were based on 92 food frequencies together with age- and sex-specific standard servings. Participants also underwent body composition measurement and reported on their physical activity levels, making it possible to estimate usual energy expenditure (EE).

Results: Obese participants had higher energy expenditure and reported higher portion sizes, but not higher energy intake than the non-obese, assuming zero intake for missing frequencies as well as standard portions. The amount of missing data was similar among normal, overweight, and obese participants.

The gaps between EE and EI were significantly smaller based on the imputed data and even more reduced when adjusting for portion size propensity. The improved agreement is not simply a result of an overall increase of EI, but also on individual level. In all three BMI categories the correlation coefficient between EE and EI tended to increase after imputation and adjustment for proportion size propensity. However, there is still no significant upward trend in energy intake by the BMI category even if the improvement is more obvious in the overweight and obese groups.

Conclusions: Missing data imputation and portion size propensity can significantly improve energy estimates from self reported FFQ. However, both methods cannot fully correct for the large underreporting in overweight and obese people. In addition, future work will examine whether we can use these adjustment procedures to obtain more valid values at the nutrient level.

Tabulate and plot measures of association after restricted cubic spline models

Nicola Orsini
Institute of Environmental Medicine, Karolinska Institutet
Restricted cubic spline is a flexible tool used in modeling the relationship between a continuous exposure and the response variable. Categorical models of the exposure remain popular to present a measure of associations in a tabular form whereas restricted cubic splines are mainly used for graphical presentations of the results. This talk presents a new postestimation command, xbrcspline, that greatly facilitates the tabular presentation of exposure-disease associations estimated from restricted cubic spline models. I illustrate the command using the Whitehall I data on the relationship between systolic blood pressure and all-cause mortality.

Additional information

Meta-analysis using Stata: Reflections by user

Anna Sidorshuk
Institute of Environmental Medicine, Karolinska Institutet
Meta-analysis is a systematic approach to identifying, appraising, synthesizing and, if appropriate, combining the results of relevant studies on a specific topic. As a part of a systematic review, meta-analysis provides useful information to guide clinical practice as well as to design future research. Stata offers a comprehensive collection of statistical tools for conducting meta-analysis ranging from classic analysis (metan) through cumulative meta-analysis (metacum), meta-regression (metareg), graphical options for forest plots and funnel plots, analytic tools for detecting bias (metabias), and influence analysis (metainf). The uniqueness of these tools is that they are not a part of official Stata documentation, but contributed and documented by researchers.

Scientific organizers

Yvonne Åberg, Metrika Consulting and Stockholm University

Nicola Orsini, Karolinska Institutet

Paul Dickman, Karolinska Institutet

Logistics organizers

Metrika Consulting, the official distributor of Stata in the Nordic and Baltic regions, and the Karolinska Institutet.