Time series regression for counts allowing for autocorrelation in air pollution studies using Stata

Speakers:  Aurelio Tobias, Institut Municipal d'Investigacio Medica, and Michael J. Campbell, Northern General Hospital

Usually the analysis of epidemiological time series data consisting of counts requires Poisson rather than linear regression. To study the short-term effects of air pollution on health different techniques based on time-series methodology and others on linear and nonlinear regression have been used.

A main objective of the APHEA project (Katsouyanni, et al. 1995) was to develop and standardize a methodology for the detection of short-term effects of air pollution on health using epidemiological time series. Since the dependent variable, such as daily mortality, is a non-negative count, a Poisson regression was used. The model assumes

where is the matrix of predictor variables on day t with regression coefficients , is the number of deaths on day t, and E denotes the expected value. Time series data usually contains autocorrelation between the observations. The presence of autocorrelation is often an indication of incomplete or inadequate model specification since the reason for autocorrelation of the deaths is because they are conditional on autocorrelated predictor variables. If the model were correct, the residual autocorrelation should be minimal since one death does not cause another. Thus residual autocorrelation maybe implies confounding of air pollution associations due to unmeasured or mismodeled variables. The solution proposed in APHEA project was to include a specification of the autocorrelation in the model, and from this, standard Poisson regression needs to be modified. Following Schwartz et al. (1996) Poisson regression with autocorrelated residuals is suitable to analyze time studies controlling for autocorrelation. In this model the covariance matrix is defined as follows:

where is the classic Poisson covariance, is the overdispersion parameter estimated from the residual using McCullagh and Nelder's method (1989), R is an autocorrelation matrix, and when k = 0 and 0 otherwise.

The code in arpois.ado fits this Poisson autoregressive model. Two types of autoregressive terms are allowed to be included in the model: studentized residuals

where , or lagged values of the dependent variable

The order of the autocorrelation could be empirically estimated examining the autocorrelation function plot. This option has been included in the code using the acplot.ado file developed by Cox (1997).

However, it should be recognized that these models could be may lead to instability of the estimated associations because the high day to day correlation of air pollution exposures. The inclusion of autocorrelation terms in the model is generally felt to produce a conservative estimate of the pollution effect size and standard error (Brunekreff et al. 1995).


Brunekreff, B., et al. 1995.
Epidemiologic studies on short-term effects of low levels of major ambient air pollution components. Environmental Health Perspective 103 (Suppl 2): 3–13.
Cox, N. 1997.
Stata module to plot the autocorrelogram. http://ideas.repec.org/c/boc/bocode/s320302.html.
Katsouyanni, K., et al. 1995.
Short-term effects of air pollution on health: a European approach using epidemiological time series data. Eur Repir J 8: 1030–1038.
McCullagh, P. and J. A. Nelder. 1989.
Generalised Linear Models. 2d ed. London: Chapman and Hall.
Schwartz, J., et al. 1996.
Methodological issues in air pollution studies and daily counts of deaths or hospital admissions. J Epidemiol Community Health 50 (Suppl 1): S3–S11.