Why must weights be constant within panel for xtgee?
|
Title
|
|
Weighted estimation and xtgee
|
|
Author
|
James Hardin, StataCorp
|
|
Date
|
February 1997
|
The answer to this question is not obvious. Here’s one:
We do not allow the weights to vary because it is too difficult to allow
them to vary. Moreover, in the interesting cases we do not know what it
means for the weights to vary, and how one would implement varying weights
differs according to meaning.
The term “weighted estimation” is too vague. Why are you
weighting? Below we present some cases.
Frequency weights
Frequency weights are the easiest to discuss because their definition is
unambiguous.
Frequency weights
are nothing more than shorthand for saying an observation is duplicated.
However, even this case is difficult to generalize to panel data.
Consider a panel with frequency weight 4. What does that mean? Does it
mean that there are four independent panels, each alike? Or does it mean
there is one panel and that each observation is observed four times?
If there are 2 observations in a panel, each with a different frequency
weight (one weighted 2 and the other weighted 4), what order are the 6
observations if I fit a time-dependent correlation structure?
As there are no easy answers to these questions and we have never seen a
panel dataset reported as frequency weighted, we do not allow them.
Weighting to produce homogeneous variances
Researchers weight data to make the variance homogeneous. This use of
weighting is an alternative to transformation. That is, consider a model
yit = Xit b + uit
where
Var(uit) = c/Wit
This model can be rewritten as
sqrt(Wit) yit = sqrt(Wit) Xit b + sqrt(Wit) uit
or
y*it = X*it b + u*it
and now
Var(u*it) = c
We provided analytic weights that can handle the special case where
Var(Uit) = c/Wi
but other cases you are going to have to handle by variable transformation.
There are lots of ways variances could be heterogeneous in a panel, so no
matter what we did, variable transformation would probably have been
required.
Sampling weights
This, we think, is the common case. You have data on individuals, and the
chance that each individual appears in your sample varies, so we are now
going to discuss standard errors in the robust, replication sense (see [U]
20.15 Obtaining robust variance estimates).
Consider a probability-weighted sample. On day 1, the sample is drawn and
then subsequently followed. In the simple case, a weight is assigned to
each individual and that weight stays constant over time. This is not too
difficult to model, and
xtgee allows
pweights.
Now consider what happens when the weights vary over time. We must ask, why
do they vary. There are two possible answers: (1) the underlying
population remains invariant but attrition affects our sample and (2) our
sample remains whole but the underlying population changes. Both are
complicated issues. Actually, we could combine (1) and (2) into another
case where new members are added to our sample at a later date, generally to
offset attrition effects (1).
These are hard questions, so let us just take case (2) and illustrate:
Pretend that we draw a sample of banks that we will follow over the next 6
years. Pretend that at some point the underlying distribution of banks
changes—let’s use the banks’ size. Pretend that there are
just two types of banks, small ones and large ones and, at some point,
something changes and 80% of the small banks disappear (merge with large
ones).
We will pretend there are lots of banks and that our sample is so small
relative to the population that none of the banks in our sample are affected
by this.
Consider the following possibilities:
- Scenario A:
We select our sample on Monday, mail our first surveys on
Tuesday, and while the surveys are in the mail, all the mergers
happen.
- Scenario B:
The mergers occur soon after the surveys are mailed back to us.
- Scenario C:
The mergers occur 5 years after the conclusion of our study.
- Scenario D:
The mergers occur the day after the conclusion of our study.
- Scenario E:
The mergers occur the day before the conclusion of our study.
- Scenario F:
The mergers occur 3 years into our study.
The point is that the solution to each of these cases is unlikely to be
plugging some number, w, into the same formula.
Adding weights to the GEE calculation of the panel data GLM is not easy
because of the form of the equation. Note the update calculation for beta
in Methods and Formulas of [XT] xtgee (Stata
Longitudinal/Panel Data Reference Manual, p. 131) that is written as
bj+1 = bj − (Σi=1m D' V-1 D)-1 (Σi=1m D' V-1 S)
This equation is analogous to the
(X'X)-1 (X'Y)
calculation for linear regression. Here is the formula for the
V term (also on page 131):
V = A1/2 R A1/2
Each of the terms is for a panel that is of size ni x
ni (and so really should be subscripted by i).
So, the question becomes, “Where do the weights fit in the calculation
of V?”
If the panels are weighted (weights are constant within panels), then the
addition of weights is clear, as we can multiply this panel calculation by a
constant, but if the weights are allowed to be subject specific, it is not
clear how they affect the calculation of V. Adding subject-specific
weights is a difficult problem and is unsolved as far as we know.
|