Time series regression for counts allowing for autocorrelation in air
pollution studies using Stata
|
Speakers |
Aurelio Tobias, Institut Municipal d'Investigacio Medica
Michael J. Campbell, Northern General Hospital
|
Usually the analysis of epidemiological time series data consisting of
counts requires Poisson rather than linear regression. To study the short-term
effects of air pollution on health different techniques based on time-series
methodology and others on linear and nonlinear regression have been used.
A main objective of the APHEA project (Katsouyanni, et al. 1995) was to develop
and standardize a methodology for the detection of short-term effects of air
pollution on health using epidemiological time series. Since the dependent
variable, such as daily mortality, is a non-negative count, a Poisson
regression was used. The model assumes
where is the matrix of predictor variables on day
t with regression coefficients
, is the number of deaths on day t, and E
denotes the expected value. Time series data usually contains autocorrelation
between the observations. The presence of autocorrelation is often an
indication of incomplete or inadequate model specification since the reason for
autocorrelation of the deaths is because they are conditional on
autocorrelated predictor variables. If the model were correct, the
residual autocorrelation should be minimal since one death does not cause
another. Thus residual autocorrelation maybe implies confounding of air
pollution associations due to unmeasured or mismodeled variables. The solution
proposed in APHEA project was to include a specification of the autocorrelation
in the model, and from this, standard Poisson regression needs to be modified.
Following Schwartz et al. (1996) Poisson regression with autocorrelated
residuals is suitable to analyze time studies controlling for autocorrelation.
In this model the covariance matrix is defined as follows:
where is the classic
Poisson covariance, is
the overdispersion parameter estimated from the
residual using McCullagh and
Nelder's method (1989), R is an autocorrelation matrix, and
when k = 0 and 0 otherwise.
The code in arpois.ado fits this Poisson autoregressive model. Two types
of autoregressive terms are allowed to be included in the model:
studentized residuals
where , or lagged values of the dependent
variable
The order of the autocorrelation could be
empirically estimated examining the autocorrelation function plot. This option
has been included in the code using the acplot.ado file developed by
Cox (1997).
However, it should be recognized that these models could be may lead to
instability of the estimated associations because the high day to day
correlation of air pollution exposures. The inclusion of autocorrelation terms
in the model is generally felt to produce a conservative estimate of the
pollution effect size and standard error (Brunekreff et al. 1995).
References
-
Brunekreff, B., et al. 1995.
- Epidemiologic studies on short-term
effects of low levels of major ambient air pollution components.
Environmental Health Perspective 103 (Suppl 2): 3–13.
-
Cox, N. 1997.
- Stata module to plot the autocorrelogram.
http://ideas.repec.org/c/boc/bocode/s320302.html.
-
Katsouyanni, K., et al. 1995.
- Short-term effects of air pollution on health: a
European approach using epidemiological time series data.
Eur Repir J 8: 1030–1038.
-
McCullagh, P. and J. A. Nelder. 1989.
- Generalised Linear Models.
2d ed. London: Chapman and Hall.
-
Schwartz, J., et al. 1996.
- Methodological issues in air pollution
studies and daily counts of deaths or hospital admissions. J Epidemiol
Community Health 50 (Suppl 1): S3–S11.
|
|