What is the difference between random-effects and population-averaged
estimators?
|
Title
|
|
Comparing RE and PA models
|
|
Author
|
William Sribney, StataCorp
|
|
Date
|
January 1999; minor revision August 2007
|
Random-effects estimators (or other cluster-specific estimators) fit the
model
Pr(Yij=1 | Xij, ui) = F(Xij b + ui)
whereas population-average estimators fit the model:
Pr(Yij=1 | Xij) = G(Xij b*)
The subtle point is that b and b* are different population parameters.
Hence, the estimators are estimating different things. In practice,
however, b and b* are often very close.
The population-averaged model does NOT fully specify the distribution of the
population. The cluster-specific model DOES fully specify the distribution
(ui is either given a distribution—i.e., a random-effects
model—or is considered fixed like Xij—i.e., a
fixed-effects model). The population-averaged model specifies only a
marginal distribution. Hence, the term “marginal” is often used
for GEE estimates.
The subtle difference between b and b* is best explained with an example.
An example with logit
Suppose that you are looking at
Outcome Yij: employment/unemployment
Predictor Xij: married/unmarried
Then, under the cluster-specific model
logit Pr(Yij=1 | Xij, ui) = a + Xij b + ui
the odds ratio
Pr(Yij=1 | Xij=1, ui)/Pr(Yij=0 | Xij=1, ui)
ORcs = --------------------------------------- = exp(b)
Pr(Yij=1 | Xij=0, ui)/Pr(Yij=0 | Xij=0, ui)
represents the odds of the person being employed if married compared with
the odds of the SAME person being employed if not married.
Under the population-averaged model
logit Pr(Yij=1 | Xij) = a + Xij b*
the odds ratio
Pr(Yij=1 | Xij=1)/Pr(Yij=0 | Xij=1)
ORpa = --------------------------------- = exp(b*)
Pr(Yij=1 | Xij=0)/Pr(Yij=0 | Xij=0)
represents the odds of an AVERAGE married person being employed compared
with the odds of an AVERAGE unmarried person being employed.
Rather than saying “AVERAGE”, sometimes I speak loosely and say
the odds of a married person “picked at random” being employed
compared with the odds of another unmarried person “picked at
random” being employed.
Let me now show that b and b* are, in general, different population
parameters.
Here is my definition of the population DISTRIBUTION. (It is NOT a
dataset.) The total population consists of five subjects:
subject i j Xij ui Zij Prcs Prpa
--------- --- ---- ---- ----- ------ ------
1 1 0 -0.2 -0.10 0.4750 0.5249
1 2 1 -0.2 0.50 0.6225 0.6674
2 1 0 -0.1 -0.00 0.5000 0.5249
2 2 1 -0.1 0.60 0.6457 0.6674
3 1 0 0.0 0.10 0.5250 0.5249
3 2 1 0.0 0.70 0.6682 0.6674
4 1 0 0.1 0.20 0.5498 0.5249
4 2 1 0.1 0.80 0.6900 0.6674
5 1 0 0.2 0.30 0.5744 0.5249
5 2 1 0.2 0.90 0.7109 0.6674
Here Zij = a + b*Xij + ui, with a = 0.1, b
= 0.6, and ui as given.
The cluster-specific probability Prcs is given by
Prcs = exp(Zij)/(1 + exp(Zij))
For this population, the population-averaged probability, Prpa,
is simply the average of Prcs for each Xij. That is,
| Prpa(Xij=1) |
= |
(1/5) * |
| (xij=1) |
| |
|
= |
(1/5) * |
(0.6225 + 0.6457 + 0.6682 + 0.6900 + 0.7109) |
| |
|
= |
0.6674 |
|
Cluster-specific odds ratio = exp(b) = exp(0.6) = 1.8221.
This is, of course, the same as the odds ratios computed within subject:
Subject 1: (0.6225/(1 - 0.6225))/(0.4750/(1 - 0.4750)) = 1.8221
Subject 2: (0.6457/(1 - 0.6457))/(0.5000/(1 - 0.5000)) = 1.8221
Subject 3: (0.6682/(1 - 0.6682))/(0.5250/(1 - 0.5250)) = 1.8221
Subject 4: (0.6900/(1 - 0.6900))/(0.5498/(1 - 0.5498)) = 1.8221
Subject 5: (0.7109/(1 - 0.7109))/(0.5744/(1 - 0.5744)) = 1.8221
Population-averaged odds ratio is
exp(b*) = (0.6674/(1 - 0.6674))/(0.5249/(1 - 0.5249)) = 1.8169
Solving for b* gives
b* = 0.5972
so b* is closer to the null, as the theory predicts (see the Neuhaus
papers).
b and b* above are the TRUE population parameters, not estimates.
If we had a dataset consisting of a sample from this population distribution,
and we used xtgee
on this dataset (with the logit link and binomial distribution),
xtgee would be estimating b*. If we used regular
logit, we
would also be estimating b* (one would want to specify the
vce(cluster
clustvar) option to correct the standard
errors in this case).
If we used clogit
on this dataset or a random-effects logit estimator, (one that assumes
normally distributed ui), we would be estimating b.
(Aside: The random-effects logit estimator described in the Neuhaus papers
assumes a distribution for ui different from that of the
random-effects logit estimator implemented in Stata. My theory discussion
here assumes one is using the “correct” distribution of
ui. I do not want to digress on this subject, but
random-effects estimators that assume different distributions for
ui are technically different estimators; hence, there is more
than one “random-effects logit estimator”.)
Here b and b* are almost the same number (b = 0.6 and b* = 0.5972), so
it is easy to obscure the fact that the cluster-specific and
population-averaged estimators are estimating different parameters. In
other cases, the difference can be greater, so it is important to keep
in mind which one you are estimating.
The bottom line for someone thinking about using the GEE estimator is to
think about whether the averaging procedure makes sense for the type of
inference you want to make. If you want to estimate how marriage makes
a person get his act together and get a job (or else leave it to the spouse
to bring home the groceries), then you want to go after b. If you want to
look at employment for the average married person compared with the average
unmarried person, then you want to go after b*.
Sometimes you might argue b* and b should be close, so the distinction
is not worth making. But you had better be sure of your argument.
Zero correlation (ui=0) makes them the same; big
Var(ui) makes the difference greater.
References
- Neuhaus J. M. 1992.
- Statistical methods for longitudinal and clustered designs with binary
responses. Statistical Methods in Medical Research 1: 249–273.
- Neuhaus, J. M., J. D. Kalbfleisch, and W. W. Hauck. 1991.
- A comparison of cluster-specific and population-averaged approaches for
analyzing correlated binary data.
International Statistical Review 59: 25–35.
|