Time series regression for counts allowing for autocorrelation in air
pollution studies using Stata
|
Speakers |
Aurelio Tobias, Institut Municipal d'Investigacio Medica
Michael J. Campbell, Northern General Hospital
|
Usually the analysis of epidemiological time series data consisting of
counts requires Poisson rather than linear regression. To study the
short-term effects of air pollution on health different techniques based on
time-series methodology and others on linear and nonlinear regression have
been used.
A main objective of the APHEA project (Katsouyanni, et al. 1995) was to
develop and standardize a methodology for the detection of short-term
effects of air pollution on health using epidemiological time series. Since
the dependent variable, such as daily mortality, is a non-negative count, a
Poisson regression was used. The model assumes

where is the matrix of predictor
variables on day
t with regression coefficients
,
is the number of deaths on day
t, and E denotes the expected value. Time series data usually
contains autocorrelation between the observations. The presence of
autocorrelation is often an indication of incomplete or inadequate model
specification since the reason for autocorrelation of the deaths is because
they are conditional on autocorrelated predictor variables. If the model
were correct, the residual autocorrelation should be minimal since one death
does not cause another. Thus residual autocorrelation maybe implies
confounding of air pollution associations due to unmeasured or mismodeled
variables. The solution proposed in APHEA project was to include a
specification of the autocorrelation in the model, and from this, standard
Poisson regression needs to be modified. Following Schwartz et al. (1996)
Poisson regression with autocorrelated residuals is suitable to analyze time
studies controlling for autocorrelation. In this model the covariance
matrix is defined as follows:

where is the classic
Poisson covariance, is
the overdispersion parameter estimated from the
residual using McCullagh and
Nelder's method (1989), R is an autocorrelation matrix, and
when k = 0 and 0 otherwise.
The code in arpois.ado fits this Poisson autoregressive model. Two
types of autoregressive terms are allowed to be included in the model:
studentized residuals
where , or lagged values of the
dependent variable
The order of the autocorrelation could be empirically estimated examining
the autocorrelation function plot. This option has been included in the code
using the acplot.ado file developed by Cox (1997).
However, it should be recognized that these models could be may lead to
instability of the estimated associations because the high day to day
correlation of air pollution exposures. The inclusion of autocorrelation
terms in the model is generally felt to produce a conservative estimate of
the pollution effect size and standard error (Brunekreff et al. 1995).
References
-
Brunekreff, B., et al. 1995.
- Epidemiologic studies on short-term
effects of low levels of major ambient air pollution components.
Environmental Health Perspective 103 (Suppl 2): 3–13.
-
Cox, N. 1997.
- Stata module to plot the autocorrelogram.
http://ideas.repec.org/c/boc/bocode/s320302.html.
-
Katsouyanni, K., et al. 1995.
- Short-term effects of air pollution on health: a
European approach using epidemiological time series data.
Eur Repir J 8: 1030–1038.
-
McCullagh, P. and J. A. Nelder. 1989.
- Generalised Linear Models.
2d ed. London: Chapman and Hall.
-
Schwartz, J., et al. 1996.
- Methodological issues in air pollution
studies and daily counts of deaths or hospital admissions. J Epidemiol
Community Health 50 (Suppl 1): S3–S11.
|
Meetings
Stata Conference
User Group meetings
Proceedings
|