|
The use of panel-data models has exploded in the past ten years as analysts
more often need to analyze richer data structures. Some examples of panel
data are cross-sectional datasets that present various observations for
different types of experimental units. An example might be counties (the
replication) in various states (the panel identifier). Other examples of
panel data are longitudinal, having multiple observations (the replication)
on the same experimental unit (the panel identifier) over time. The
xtgee command allows either type of panel data.
Stata estimates extensions to generalized linear models in which you can
model the structure of the within-panel correlation. This extension allows
users to fit GLM-type models to panel data.
The xtgee command offers a rich collection of models for analysts.
These models correspond to population-averaged (or marginal)
models in the panel-data literature.
What makes xtgee useful is the number of statistical models that it
generalizes for use with panel data, the richer correlation structure with
models available in other commands, and the availability of robust standard
errors, which do not always exist in the equivalent command.
In this example, we consider a probit model in which we wish to model
whether a worker belongs to the union based on the person's age and whether
they are living outside of an SMSA. The people in the study appear multiple
times in the dataset (this type of panel dataset is commonly referred to as
a longitudinal dataset), and we assume that the observations on a given
person are more correlated than those between different persons.
. webuse nlswork
(National Longitudinal Survey. Young Women 14-26 years of age in 1968)
. xtset idcode
panel variable: idcode (unbalanced)
. xtgee union age not_smsa, family(binomial) link(probit) corr(exchangeable)
Iteration 1: tolerance = .05859927
Iteration 2: tolerance = .00346479
Iteration 3: tolerance = .0001277
Iteration 4: tolerance = 4.486e-06
Iteration 5: tolerance = 1.548e-07
GEE population-averaged model Number of obs = 19226
Group variable: idcode Number of groups = 4150
Link: probit Obs per group: min = 1
Family: binomial avg = 4.6
Correlation: exchangeable max = 12
Wald chi2(2) = 30.23
Scale parameter: 1 Prob > chi2 = 0.0000
------------------------------------------------------------------------------
union | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
age | .0045624 .0013959 3.27 0.001 .0018264 .0072984
not_smsa | -.1440246 .0318838 -4.52 0.000 -.2065156 -.0815336
_cons | -.8770284 .0479603 -18.29 0.000 -.9710288 -.7830279
------------------------------------------------------------------------------
xtgee options
The xtgee command allows these options:
Families
- Bernoulli/binomial
- gamma
- Gaussian
- inverse Gaussian
- negative binomial
- Poisson
|
Links
- cloglog
- identity
- log
- logit
- negative binomial
- odds power
- power
- probit
- reciprocal
|
Correlation structures
- independent
- exchangeable
- autoregressive
- stationary
- nonstationary
- unstructured
- user-specified
|
|
Assume an independent correlation structure that ignores the panel
structure of the data. Under this assumption, xtgee will produce
answers already provided by Stata’s nonpanel estimation commands.
Examples of situations when xtgee provides the same answers as an
existing command are given in the table shown to the right.
| Note 1 |
These methods produce the same results only for balanced panels.
|
| Note 2 |
For cloglog
estimation, xtgee with
corr(independent) and
cloglog will produce the same
coefficients, but the standard errors will be only asymptotically
equivalent because cloglog is not the canonical link for the binomial
family.
|
| Note 3 |
For probit estimation, xtgee with
corr(independent) and
probit will produce the same
coefficients, but the standard errors will be only asymptotically
equivalent because probit is not the canonical link for the binomial
family. If the binomial denominator is not 1, the equivalent
maximum-likelihood command is bprobit.
|
| Note 4 |
Fitting a
negative binomial model using xtgee
(or glm) will yield results
conditional on the specified value of alpha.
nbreg, however, estimates that
parameter and provides unconditional estimates. |
| Note 5 |
xtgee with
corr(independent) can be used to fit
exponential regressions, but this requires specifying
scale(1). As with probit, the
xtgee-reported standard errors
will be only asymptotically equivalent to those produced by
streg, dist(exp) nohr because log
is not the canonical link for the gamma family.
xtgee cannot be used to fit
exponential regressions on censored data.
Using the independent correlation
structure, the xtgee command will
fit the same model as the glm, irls
command if the family–link combination is the same.
|
| Note 6 |
If the xtgee command is equivalent
to another command, using corr(independent)
and the vce(robust) option with
xtgee corresponds to using
vce(cluster clustvar) option
in the equivalent command, where clustvar corresponds to the
panel variable.
|
|
| Family |
Link |
Correlation |
Equivalent Stata command |
| gaussian |
identity |
independent |
regress |
| gaussian |
identity |
exchangeable |
xtreg, re (see note 1) |
| gaussian |
identity |
exchangeable |
xtreg, pa |
| binomial |
cloglog |
independent |
cloglog (see note 2) |
| binomial |
cloglog |
exchangeable |
xtcloglog, pa |
| binomial |
logit |
independent |
logit or logistic |
| binomial |
logit |
exchangeable |
xtlogit, pa |
| binomial |
probit |
independent |
probit (see note 3) |
| binomial |
probit |
exchangeable |
xtprobit, pa |
| nbinomial |
nbinomial |
independent |
nbreg (see note 4) |
| poisson |
log |
independent |
poisson |
| poisson |
log |
exchangeable |
xtpoisson, pa |
| gamma |
log |
independent |
streg, dist(exp) nohr (see note 5) |
| family |
link |
independent |
glm, irls (see note 6) |
|
If you choose to model the intracluster correlation as an identity matrix
(by specifying the name of an existing identity matrix in the option
corr), GEE estimation reduces to a generalized linear model, and the
results will be identical to estimation by glm.
. glm union age not_smsa, family(gauss) link(identity)
Iteration 0: log likelihood = -14095.328
Generalized linear models No. of obs = 26200
Optimization : ML Residual df = 26197
Scale parameter = .1717383
Deviance = 4499.028831 (1/df) Deviance = .1717383
Pearson = 4499.028831 (1/df) Pearson = .1717383
Variance function: V(u) = 1 [Gaussian]
Link function : g(u) = u [Identity]
AIC = 1.076208
Log likelihood = -14095.3277 BIC = -262016.5
------------------------------------------------------------------------------
| OIM
union | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
age | .0021565 .0003948 5.46 0.000 .0013828 .0029303
not_smsa | -.0591923 .0056826 -10.42 0.000 -.0703301 -.0480546
_cons | .1729596 .0123365 14.02 0.000 .1487806 .1971386
------------------------------------------------------------------------------
. xtgee union age not_smsa, family(gauss) link(identity) corr(indep)
Iteration 1: tolerance = 9.080e-15
GEE population-averaged model Number of obs = 26200
Group variable: idcode Number of groups = 4434
Link: identity Obs per group: min = 1
Family: Gaussian avg = 5.9
Correlation: independent max = 12
Wald chi2(2) = 134.68
Scale parameter: .1717187 Prob > chi2 = 0.0000
Pearson chi2(26200): 4499.03 Deviance = 4499.03
Dispersion (Pearson): .1717187 Dispersion = .1717187
------------------------------------------------------------------------------
union | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
age | .0021565 .0003948 5.46 0.000 .0013828 .0029302
not_smsa | -.0591923 .0056823 -10.42 0.000 -.0703295 -.0480552
_cons | .1729596 .0123357 14.02 0.000 .148782 .1971372
------------------------------------------------------------------------------
We could fill up lots of space demonstrating other ways that the
xtgee command is equivalent to other commands, but the real power is
in using it for its intended use and modeling the correlation that exists in
the panels.
|