Generalized estimating equations: xtgee
The use of paneldata models has exploded in the past ten years as analysts
more often need to analyze richer data structures. Some examples of panel
data are nested datasets that contain observations of smaller units nested
within larger units. An example might be counties (the
replication) in various states (the panel identifier). Other examples of
panel data are longitudinal, having multiple observations (the replication)
on the same experimental unit (the panel identifier) over time. The
xtgee command allows either type of panel data.
Stata estimates extensions to generalized linear models in which you can
model the structure of the withinpanel correlation. This extension allows
users to fit GLMtype models to panel data.
The xtgee command offers a rich collection of models for analysts.
These models correspond to populationaveraged (or marginal)
models in the paneldata literature.
What makes xtgee useful is the number of statistical models that it
generalizes for use with panel data, the richer correlation structure with
models available in other commands, and the availability of robust standard
errors, which do not always exist in the equivalent command.
In this example, we consider a probit model in which we wish to model
whether a worker belongs to the union based on the person's age and whether
they are living outside of an SMSA. The people in the study appear multiple
times in the dataset (this type of panel dataset is commonly referred to as
a longitudinal dataset), and we assume that the observations on a given
person are more correlated than those between different persons.
. webuse nlswork
(National Longitudinal Survey. Young Women 1426 years of age in 1968)
. xtset idcode
panel variable: idcode (unbalanced)
. xtgee union age not_smsa, family(binomial) link(probit) corr(exchangeable)
Iteration 1: tolerance = .05859927
Iteration 2: tolerance = .00346479
Iteration 3: tolerance = .0001277
Iteration 4: tolerance = 4.486e06
Iteration 5: tolerance = 1.548e07
GEE populationaveraged model Number of obs = 19226
Group variable: idcode Number of groups = 4150
Link: probit Obs per group: min = 1
Family: binomial avg = 4.6
Correlation: exchangeable max = 12
Wald chi2(2) = 30.23
Scale parameter: 1 Prob > chi2 = 0.0000

union   Coef. Std. Err. z P>z [95% Conf. Interval] 
  
age   .0045624 .0013959 3.27 0.001 .0018264 .0072984 
not_smsa   .1440246 .0318838 4.52 0.000 .2065156 .0815336 
_cons   .8770284 .0479603 18.29 0.000 .9710288 .7830279 

The xtgee command allows these options:
Families
 Bernoulli/binomial
 gamma
 Gaussian
 inverse Gaussian
 negative binomial
 Poisson

Links
 cloglog
 identity
 log
 logit
 negative binomial
 odds power
 power
 probit
 reciprocal

Correlation structures
 independent
 exchangeable
 autoregressive
 stationary
 nonstationary
 unstructured
 userspecified

Assume an independent correlation structure that ignores the panel
structure of the data. Under this assumption, xtgee will produce
answers already provided by Stata’s nonpanel estimation commands.
Examples of situations when xtgee provides the same answers as an
existing command are given in the table shown to the right.
Note 1 
These methods produce the same results only for balanced panels.

Note 2 
For cloglog
estimation, xtgee with
corr(independent) and
cloglog will produce the same
coefficients, but the standard errors will be only asymptotically
equivalent because cloglog is not the canonical link for the binomial
family.

Note 3 
For probit estimation, xtgee with
corr(independent) and
probit will produce the same
coefficients, but the standard errors will be only asymptotically
equivalent because probit is not the canonical link for the binomial
family. If the binomial denominator is not 1, the equivalent
maximumlikelihood command is bprobit.

Note 4 
Fitting a
negative binomial model using xtgee
(or glm) will yield results
conditional on the specified value of alpha.
nbreg, however, estimates that
parameter and provides unconditional estimates. 
Note 5 
xtgee with
corr(independent) can be used to fit
exponential regressions, but this requires specifying
scale(1). As with probit, the
xtgeereported standard errors
will be only asymptotically equivalent to those produced by
streg, dist(exp) nohr because log
is not the canonical link for the gamma family.
xtgee cannot be used to fit
exponential regressions on censored data.
Using the independent correlation
structure, the xtgee command will
fit the same model as the glm, irls
command if the family–link combination is the same.

Note 6 
If the xtgee command is equivalent
to another command, using corr(independent)
and the vce(robust) option with
xtgee corresponds to using
vce(cluster clustvar) option
in the equivalent command, where clustvar corresponds to the
panel variable.


Family 
Link 
Correlation 
Equivalent Stata command 
gaussian 
identity 
independent 
regress 
gaussian 
identity 
exchangeable 
xtreg, re (see note 1) 
gaussian 
identity 
exchangeable 
xtreg, pa 
binomial 
cloglog 
independent 
cloglog (see note 2) 
binomial 
cloglog 
exchangeable 
xtcloglog, pa 
binomial 
logit 
independent 
logit or logistic 
binomial 
logit 
exchangeable 
xtlogit, pa 
binomial 
probit 
independent 
probit (see note 3) 
binomial 
probit 
exchangeable 
xtprobit, pa 
nbinomial 
nbinomial 
independent 
nbreg (see note 4) 
poisson 
log 
independent 
poisson 
poisson 
log 
exchangeable 
xtpoisson, pa 
gamma 
log 
independent 
streg, dist(exp) nohr (see note 5) 
family 
link 
independent 
glm, irls (see note 6) 

If you choose to model the intracluster correlation as an identity matrix
(by specifying the name of an existing identity matrix in the option
corr), GEE estimation reduces to a generalized linear model, and the
results will be identical to estimation by glm.
. glm union age not_smsa, family(gauss) link(identity)
Iteration 0: log likelihood = 10713.086
Generalized linear models No. of obs = 19226
Optimization : ML Residual df = 19223
Scale parameter = .1784791
Deviance = 3430.904127 (1/df) Deviance = .1784791
Pearson = 3430.904127 (1/df) Pearson = .1784791
Variance function: V(u) = 1 [Gaussian]
Link function : g(u) = u [Identity]
AIC = 1.114749
Log likelihood = 10713.08631 BIC = 186185.1

  OIM 
union   Coef. Std. Err. z P>z [95% Conf. Interval] 
  
age   .0018369 .0004926 3.73 0.000 .0008714 .0028024 
not_smsa   .0648492 .0067672 9.58 0.000 .0781126 .0515858 
_cons   .1950571 .0158061 12.34 0.000 .1640777 .2260365 

. xtgee union age not_smsa, family(gauss) link(identity) corr(indep)
Iteration 1: tolerance = 6.230e15
GEE populationaveraged model Number of obs = 19226
Group variable: idcode Number of groups = 4150
Link: identity Obs per group: min = 1
Family: Gaussian avg = 4.6
Correlation: independent max = 12
Wald chi2(2) = 103.63
Scale parameter: .1784513 Prob > chi2 = 0.0000
Pearson chi2(19226): 3430.90 Deviance = 3430.90
Dispersion (Pearson): .1784513 Dispersion = .1784513

union   Coef. Std. Err. z P>z [95% Conf. Interval] 
  
age   .0018369 .0004926 3.73 0.000 .0008715 .0028023 
not_smsa   .0648492 .0067666 9.58 0.000 .0781116 .0515869 
_cons   .1950571 .0158049 12.34 0.000 .1640801 .2260341 

We could fill up lots of space demonstrating other ways that the
xtgee command is equivalent to other commands, but the real power is
in using it for its intended use and modeling the correlation that exists in
the panels.