Regresió de series temporales epidemiológicas con Stata

Speakers:   Aurelio Tobías and Mike J. Campbell

Introduction

Time series regression models are specially suitable in epidemiology for evaluating short-term effects of time-varying exposures. In epidemiological time-series studies, a single population is assessed with reference to its change over the time in the ra te of any health outcome and the corresponding changes in the exposure factors during the same period. Time-series regression have been applied in a wide range of situations. Examples might include the study of the short-term effects of air pollution on health [1]; sudden infant death syndrome and environmental temperature [2]; or infectious gastrointestinal illness related to drinking water [3].

Stata manuals are alphabetically ordered by the command name instead of topics. This implies that reviews of commands can be useful for users. However, specific commands for time-series regression of counts are not available in Stata by default. Usually ado files came from three different sources; official Stata commands, Stata Technical Bulletin, and Boston College of Economics website. We present a review of useful commands developed by different users to deal with this topic. These commands can be divided in four categories; data management, graphics, statistical analysis, and model fit.

Data management

First step is check for duplicate records (dups). In a time-series analysis we should also generate sinusoidal terms (gensin), lags, and moving summary statistics of variables (movsumm). When the analysis has been done, we can transform results from regression models as an increase of the relative risk for k units of the x variable (getrr), and also keep th e parameter estimates in a new data set.

Graphics

The graph command forms the core of Stata graphics. To produce scatterplots with y versus multiple x, or with multiple y versus multiple x variables, the muxplot and muxyplot are available. We can study the distribution of a variable over time with tsplot, or using the more powerful sssplot. The cross-correlation plot (xcorr) can be used to study lag structures.

Statistical analysis

Time series data usually contain autocorrelation between observations. We must check, graphically, for residual autocorrelation thorough the ACF (ac) and PACF (pac) plot s. Another problem is the overdispersion, it can be tested calculating the overdispersion parameter through the sum of the chi-square residuals (odp). The solution to both problems proposed in the APHE A project [1] was to include a specification of the autocorrelation in the model. The command arpois fits a log-linear model allowing for autocorrelation and overdispersion using Iterative Weighted Least Squares. This ado file is based in the Schwartz's SAS macro. Generalised Additive Models [5] have been suggested as a better alternative to analyse epidemiological time-series data [6]. The gam command is based in the GAMFIT program [7]. Finally, robust regression methods can also be fitted; rglm calculates a Huber (sandwich) estimate of the variance-covariance matrix of estimates.

Model fit

Dealing with nested models the loglikelihood ratio test is preferable ( lrtest or lrtest2). Whilist non-nested models the Akaike's Information Criteria is suggested (mlfit). Alternatively, for any maximum likelihood estimation Stata provides the pseudo-R2.

References

Schwartz, J., C. Spix, G. Touloumi, et al. 1996.
Methodological issues in studies of air pollution and daily counts of deaths or hospital admissions. J Epidemiol Community Health 50 (suppl 1); S3–S11.
Buchdahl, R, A. Parker, T. Stebbings, et al. 1996.
Association between air pollution and acute childhood wheezy episodes: prospective observational study. Br Med J 312: 661–665.
Campbell, M. J. 1994.
Time series regression for counts: an investigation into the relationship between Sudden Infant Death Syndrome and environmental temperature. J Royal Stat Soc A 157: 191–208.
Schwartz, J., R. Levin, K. Hodge. 1997.
Drinking water turbidity and pediatric hospital use for gastrointestinal illness in Philadelphia. Epidemiology 8: 615–620.
Hastie, T. J., R. J. Tibshiriani. 1990.
Generalized Additive Models. London: Chapman and Hall.
Schwartz, J. 1994.
Non-parametric smoothing in the analysis of air pollution and respiratory illness. Can J Stat 4: 471–487.
Hastie, T. J., R. J. Tibshirani.
GAMFIT software. (http://lib.stat.cmu.edu/general/).