Stata
Products Purchase Support Company
Search
   >> Home >> Products >> Capabilities >> Panel data >> Generalized estimating equations

Generalized estimating equations: xtgee

The use of panel-data models has exploded in the past ten years as analysts more often need to analyze richer data structures. Some examples of panel data are cross-sectional datasets that we present various observations for different types of experimental units. An example might be counties (the replication) in various states (the panel identifier). Other examples of panel data are longitudinal, having multiple observations (the replication) on the same experimental unit (the panel identifier) over time. The xtgee command allows either type of panel data.

Stata estimates extensions to generalized linear models in which you can model the structure of the within-panel correlation. This extension allows users to model GLM-type models with panel data.

The xtgee command offers a rich collection of models for analysts. These models correspond to population-averaged (or marginal) models in the panel-data literature.

What makes xtgee useful is the number of statistical models that it generalizes for use with panel data, the richer correlation structure with models available in other commands, and the availability of robust standard errors, which do not always exist in the equivalent command.

In this example, we consider a probit model in which we wish to model whether a worker belongs to the union based on the person's age and whether they are living outside of an SMSA. The people in the study appear multiple times in the dataset (this type of panel dataset is commonly referred to as a longitudinal dataset), and we assume that the observations on a given person are more correlated than those between different persons.

  . webuse nlswork
  (National Longitudinal Survey.  Young Women 14-26 years of age in 1968)

  . iis idcode

  . xtgee union age not_smsa, fam(binomial) link(probit) corr(exchangeable)

  Iteration 1: tolerance = .05859927
  Iteration 2: tolerance = .00346479
  Iteration 3: tolerance = .0001277
  Iteration 4: tolerance = 4.486e-06
  Iteration 5: tolerance = 1.548e-07

  GEE population-averaged model                   Number of obs      =     19226
  Group variable:                     idcode      Number of groups   =      4150
  Link:                               probit      Obs per group: min =         1
  Family:                           binomial                     avg =       4.6
  Correlation:                  exchangeable                     max =        12
                                                  Wald chi2(2)       =     30.23
  Scale parameter:                         1      Prob > chi2        =    0.0000

  ------------------------------------------------------------------------------
         union |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
  -------------+----------------------------------------------------------------
           age |   .0045624   .0013959     3.27   0.001     .0018264    .0072984
      not_smsa |  -.1440246   .0318838    -4.52   0.000    -.2065156   -.0815336
         _cons |  -.8770284   .0479603   -18.29   0.000    -.9710288   -.7830279
  ------------------------------------------------------------------------------

xtgee options

The xtgee command allows these options:

Families
  • Bernoulli/binomial
  • gamma
  • Gaussian
  • inverse Gaussian
  • negative binomial
  • Poisson
Links
  • cloglog
  • identity
  • log
  • logit
  • negative binomial
  • odds power
  • power
  • probit
  • reciprocal
Correlation structures
  • independent
  • exchangeable
  • autoregressive
  • stationary
  • nonstationary
  • unstructured
  • user-specified

Assume an independent correlation structure that ignores the panel structure of the data. Under this assumption, xtgee will produce answers already provided by Stata’s nonpanel estimation commands. Examples of situations when xtgee provides the same answers as an existing command are given in the table shown to the right.

Note 1

These methods produce the same results only in the case of balanced panels.

Note 2

For cloglog estimation, xtgee with corr(independent) and cloglog will produce the same coefficients, but the standard errors will be only asymptotically equivalent because cloglog is not the canonical link for the binomial family.

Note 3

For probit estimation, xtgee with corr(independent) and probit will produce the same coefficients, but the standard errors will be only asymptotically equivalent because probit is not the canonical link for the binomial family. If the binomial denominator is not 1, the equivalent maximum-likelihood command is bprobit.

Note 4

Fitting a negative binomial model using xtgee (or glm) will yield results conditional on the specified value of alpha. nbreg, however, estimates that parameter and provides unconditional estimates.

Note 5

xtgee with corr(independent) can be used to fit exponential regressions, but this requires specifying scale(1). As with probit, the xtgee-reported standard errors will be only asymptotically equivalent to those produced by streg, dist(exp) nohr because log is not the canonical link for the gamma family. xtgee cannot be used to fit exponential regressions on censored data.

Using the independent correlation structure, the xtgee command will fit the same model as the glm, irls command if the family–link combination is the same.

Note 6

If the xtgee command is equivalent to another command, using corr(independent) and the robust option with xtgee corresponds to using both the robust option and the cluster(varname) option in the equivalent command, where varname corresponds to the i() group variable.

Family Link Correlation Equivalent Stata command
Gaussian identity independent regress
Gaussian identity exchangeable xtreg, re (see note 1)
Gaussian identity exchangeable xtreg, pa
binomial cloglog independent cloglog (see note 2)
binomial cloglog exchangeable xtcloglog, pa
binomial logit independent logit or logistic
binomial logit exchangeable xtlogit, pa
binomial probit independent probit (see note 3)
binomial probit exchangeable xtprobit, pa
nbinomial nbinomial independent nbreg (see note 4)
Poisson log independent poisson
Poisson log exchangeable xtpoisson, pa
gamma log independent streg, dist(exp) nohr (see note 5)
family link independent glm, irls (see note 6)

If you choose to model the intracluster correlation as an identity matrix (by specifying the name of an existing identity matrix in the option corr), GEE estimation reduces to a generalized linear model, and the results will be identical to estimation by glm.

  . glm union age not_smsa, link(identity) family(gauss)

  Iteration 0:   log likelihood = -14095.328

  Generalized linear models                          No. of obs      =     26200
  Optimization     : ML                              Residual df     =     26197
                                                     Scale parameter =  .1717383
  Deviance         =  4499.028831                    (1/df) Deviance =  .1717383
  Pearson          =  4499.028831                    (1/df) Pearson  =  .1717383
  
  Variance function: V(u) = 1                        [Gaussian]
  Link function    : g(u) = u                        [Identity]
  
                                                     AIC             =  1.076208
  Log likelihood   =  -14095.3277                    BIC             = -262016.5
  
  ------------------------------------------------------------------------------
               |                 OIM
         union |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
  -------------+----------------------------------------------------------------
           age |   .0021565   .0003948     5.46   0.000     .0013828    .0029303
      not_smsa |  -.0591923   .0056826   -10.42   0.000    -.0703301   -.0480546
         _cons |   .1729596   .0123365    14.02   0.000     .1487806    .1971386
  ------------------------------------------------------------------------------
  
  . xtgee union age not_smsa, link(identity) family(gauss) corr(indep)
  
  Iteration 1: tolerance = 9.080e-15
  
  GEE population-averaged model                   Number of obs      =     26200
  Group variable:                     idcode      Number of groups   =      4434
  Link:                             identity      Obs per group: min =         1
  Family:                           Gaussian                     avg =       5.9
  Correlation:                   independent                     max =        12
                                                  Wald chi2(2)       =    134.68
  Scale parameter:                  .1717187      Prob > chi2        =    0.0000
  
  Pearson chi2(26200):               4499.03      Deviance           =   4499.03
  Dispersion (Pearson):             .1717187      Dispersion         =  .1717187
  
  ------------------------------------------------------------------------------
         union |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
  -------------+----------------------------------------------------------------
           age |   .0021565   .0003948     5.46   0.000     .0013828    .0029302
      not_smsa |  -.0591923   .0056823   -10.42   0.000    -.0703295   -.0480552
         _cons |   .1729596   .0123357    14.02   0.000      .148782    .1971372
  ------------------------------------------------------------------------------

We could fill up lots of space demonstrating other ways that the xtgee command is equivalent to other commands, but the real power is in using it for its intended use and modeling the correlation that exists in the panels.

See New in Stata 10 for more about what was added in Stata Release 10.

Stata 10
Overview: Why use Stata?
Stata/MP
64-bit Stata
Capabilities
Overview
Statistics
Basic statistics
Linear models
Multilevel mixed-effects models
Limited dependent variables
Panel data
Cross-sectional TS regression
Generalized estimating equations
GLM
Nonparametric
Exact statistics
ANOVA / MANOVA
Multivariate methods
Cluster analysis
Bootstrapping
Model testing
Survey methods
Survival analysis
Epidemiology tools
Time series
Maximum likelihood
Normality tests
Other methods
Data management
Graphics
Matrix programming—Mata
Programming
Internet capabilities
Y2K
Accessibility
Sample session
New in Stata 10
Supported platforms
Which Stata package?
Technical support
User comments
Products
Stata 10
Order Stata
Upgrade
NetCourses
Bookstore
Stata Journal
Stata Press
Stata News
STB
Stat/Transfer
Gift Shop

Site overview
Products
Resources & support
Company
Site index

© Copyright 1996–2008 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   Site index