**[XT] xtgee** -- Fit population-averaged panel-data models by using GEE

__Syntax__

**xtgee** *depvar* [*indepvars*] [*if*] [*in*] [*weight*] [**,** *options*]

*options* Description
-------------------------------------------------------------------------
Model
__f__**amily(***family***)** distribution of *depvar*
__l__**ink(***link***)** link function

Model 2
__exp__**osure(***varname***)** include ln(*varname*) in model with coefficient
constrained to 1
__off__**set(***varname***)** include *varname* in model with coefficient
constrained to 1
__nocons__**tant** suppress constant term
**asis** retain perfect predictor variables
**force** estimate even if observations unequally spaced in
time

Correlation
__c__**orr(***correlation***)** within-group correlation structure

SE/Robust
**vce(***vcetype***)** *vcetype* may be **conventional**, __r__**obust**, __boot__**strap**, or
__jack__**knife**
**nmp** use divisor N-P instead of the default N
**rgf** multiply the robust variance estimate by
(N-1)/(N-P)
__s__**cale(***parm***)** override the default scale parameter; *parm* may be
**x2**, **dev**, **phi**, or *#*

Reporting
__le__**vel(***#***)** set confidence level; default is **level(95)**
__ef__**orm** report exponentiated coefficients
*display_options* control columns and column formats, row spacing,
line width, display of omitted variables and
base and empty cells, and factor-variable
labeling

Optimization
*optimize_options* control the optimization process; seldom used

__nodis__**play** suppress display of header and coefficients
__coefl__**egend** display legend instead of statistics
-------------------------------------------------------------------------
A panel variable must be specified. Correlation structures other than
**exchangeable** and **independent** require that a time variable also be
specified. Use **xtset**.
*indepvars* may contain factor variables; see fvvarlist.
*depvar* and *indepvars* may contain time-series operators; see tsvarlist.
**by**, **mfp**, **mi estimate**, and **statsby** are allowed; see prefix.
**vce(bootstrap)** and **vce(jackknife)** are not allowed with the **mi estimate**
prefix.
**iweight**s, **fweight**s, and **pweight**s are allowed; see weight. Weights must
be constant within panel.
**nodisplay** and **coeflegend** do not appear in the dialog box.
See **[XT] xtgee postestimation** for features available after estimation.

*family* Description
-------------------------------------------------------------------------
__gau__**ssian** Gaussian (normal); **family(normal)** is a synonym
__ig__**aussian** inverse Gaussian
__b__**inomial**[*#*|*varname*] Bernoulli/binomial
__p__**oisson** Poisson
__nb__**inomial**[*#*] negative binomial
__gam__**ma** gamma
-------------------------------------------------------------------------

*link* Description
-------------------------------------------------------------------------
__i__**dentity** identity; y=y
**log** log; ln(y)
__logi__**t** logit; ln{y/(1-y)}, natural log of the odds
__p__**robit** probit; inverse Gaussian cumulative
__cl__**oglog** clog-log; ln{-ln(1-y)}
__pow__**er**[*#*] power; y^k with k=#; #=1 if not specified
__opo__**wer**[*#*] odds power; [{y/(1-y)}^k - 1]/k with k=#; #=1 if
not specified
__nb__**inomial** negative binomial
__rec__**iprocal** reciprocal; 1/y
-------------------------------------------------------------------------

*correlation* Description
-------------------------------------------------------------------------
__exc__**hangeable** exchangeable
__ind__**ependent** independent
__uns__**tructured** unstructured
__fix__**ed** *matname* user-specified
**ar** *#* autoregressive of order *#*
__sta__**tionary** *#* stationary of order *#*
__non__**stationary** *#* nonstationary of order *#*
-------------------------------------------------------------------------

__Menu__

**Statistics > Longitudinal/panel data >** **Generalized estimating equations**
**(GEE) >** **Generalized estimating equations (GEE)**

__Description__

**xtgee** fits population-averaged panel-data models. In particular, **xtgee**
fits generalized linear models and allows you to specify the within-group
correlation structure for the panels.

See logistic estimation commands and **[R] regress** for lists of related
estimation commands.

__Options__

+-------+
----+ Model +------------------------------------------------------------

**family(***family***)** specifies the distribution of *depvar*; **family(gaussian)** is
the default.

**link(***link***)** specifies the link function; the default is the canonical link
for the **family()** specified (except for **family(nbinomial)**).

+---------+
----+ Model 2 +----------------------------------------------------------

**exposure(***varname***)** and **offset(***varname***)** are different ways of specifying
the same thing. **exposure()** specifies a variable that reflects the
amount of exposure over which the *depvar* events were observed for
each observation; ln(*varname*) with coefficient constrained to be 1 is
entered into the regression equation. **offset()** specifies a variable
that is to be entered directly into the log-link function with its
coefficient constrained to be 1; thus, exposure is assumed to be
e^varname. If you were fitting a Poisson regression model,
**family(poisson) link(log)**, for instance, you would account for
exposure time by specifying **offset()** containing the log of exposure
time.

**noconstant** specifies that the linear predictor has no intercept term,
thus forcing it through the origin on the scale defined by the link
function.

**asis** forces retention of perfect predictor variables and their
associated, perfectly predicted observations and may produce
instabilities in maximization; see **[R] probit**. This option is only
allowed with option **family(binomial)** with a denominator of 1.

**force** specifies that estimation be forced even though the time variable
is not equally spaced. This is relevant only for correlation
structures that require knowledge of the time variable. These
correlation structures require that observations be equally spaced so
that calculations based on lags correspond to a constant time change.
If you specify a time variable indicating that observations are not
equally spaced, the (time dependent) model will not be fit. If you
also specify **force**, the model will be fit, and it will be assumed
that the lags based on the data ordered by the time variable are
appropriate.

+-------------+
----+ Correlation +------------------------------------------------------

**corr(***correlation***)** specifies the within-group correlation structure; the
default corresponds to the equal-correlation model,
**corr(exchangeable)**.

When you specify a correlation structure that requires a lag, you
indicate the lag after the structure's name with or without a blank;
for example, **corr(ar 1)** or **corr(ar1)**.

If you specify the fixed correlation structure, you specify the name
of the matrix containing the assumed correlations following the word
**fixed**, for example, **corr(fixed myr)**.

+-----------+
----+ SE/Robust +--------------------------------------------------------

**vce(***vcetype***)** specifies the type of standard error reported, which
includes types that are derived from asymptotic theory
(**conventional**), that are robust to some kinds of misspecification
(**robust**), and that use bootstrap or jackknife methods (**bootstrap**,
**jackknife**); see **[XT] ***vce_options*.

**vce(conventional)**, the default, uses the conventionally derived
variance estimator for generalized least-squares regression.

**vce(robust)** specifies that the Huber/White/sandwich estimator of
variance is to be used in place of the default conventional variance
estimator (see *Methods and formulas* in **[XT] xtgee**). Use of this
option causes **xtgee** to produce valid standard errors even if the
correlations within group are not as hypothesized by the specified
correlation structure. Under a noncanonical link, it does, however,
require that the model correctly specifies the mean. The resulting
standard errors are thus labeled "semirobust" instead of "robust" in
this case. Although there is no **vce(cluster** *clustvar***)** option,
results are as if this option were included and you specified
clustering on the panel variable.

**nmp**; see **[XT] ***vce_options*.

**rgf** specifies that the robust variance estimate is multiplied by
(N-1)/(N-P), where N is the total number of observations and P is the
number of coefficients estimated. This option can be used only with
**family(gaussian)** when **vce(robust)** is either specified or implied by
the use of **pweight**s. Using this option implies that the robust
variance estimate is not invariant to the scale of any weights used.

**scale(x2**|**dev**|**phi**|*#***)**; see **[XT] ***vce_options*.

+-----------+
----+ Reporting +--------------------------------------------------------

**level(***#***)**; see **[R] estimation options**.

**eform** displays the exponentiated coefficients and corresponding standard
errors and confidence intervals as described in **maximize**. For
**family(binomial) link(logit)** (that is, logistic regression),
exponentiation results in odds ratios; for **family(poisson) link(log)**
(that is, Poisson regression), exponentiated coefficients are
incidence-rate ratios.

*display_options*: **noci**, __nopv__**alues**, __noomit__**ted**, **vsquish**, __noempty__**cells**,
__base__**levels**, __allbase__**levels**, __nofvlab__**el**, **fvwrap(***#***)**, **fvwrapon(***style***)**,
**cformat(***%fmt***)**, **pformat(%***fmt***)**, **sformat(%***fmt***)**, and **nolstretch**; see **[R]**
**estimation options**.

+--------------+
----+ Optimization +-----------------------------------------------------

*optimize_options* control the iterative optimization process. These
options are seldom used.

__iter__**ate(***#***)** specifies the maximum number of iterations. When the
number of iterations equals #, the optimization stops and presents
the current results, even if the convergence tolerance has not been
reached. The default is **iterate(100)**.

__tol__**erance(***#***)** specifies the tolerance for the coefficient vector.
When the relative change in the coefficient vector from one iteration
to the next is less than or equal to #, the optimization process is
stopped. **tolerance(1e-6)** is the default.

**nolog** suppress the display of the iteration log.

__tr__**ace** specifies that the current estimates be printed at each
iteration.

The following options are available with **xtgee** but are not shown in the
dialog box:

**nodisplay** is for programmers. It suppresses the display of the header
and coefficients.

**coeflegend**; see **[R] estimation options**.

__Examples__

Setup
**. webuse union**
**. xtset id year**

Fit a logit model
**. xtgee union age grade not_smsa south, family(binomial)** **link(logit)**

Fit a probit model with AR(1) correlation
**. xtgee union age grade not_smsa south, family(binomial)** **link(probit)**
**corr(ar1)**

__Correlation structures and the allowed spacing of observations within panel__

--characteristics allowed--
Unequal
Correlation Unbalanced spacing Gaps
-------------------------------------------
independent yes yes yes
exchangeable yes yes yes
ar k yes (*) no no
stationary k yes (*) no no
nonstationary k yes (*) no no
unstructured yes yes yes
fixed yes yes yes
-------------------------------------------
(*) All panels must have at least k+1 obs.

Definitions:

1. Panels are balanced if each has the same number of observations.

2. Panels are equally spaced if the interval between observations is
constant.

3. Panels have gaps if some observations are missing.

__Stored results__

**xtgee** stores the following in **e()**:

Scalars
**e(N)** number of observations
**e(N_g)** number of groups
**e(df_m)** model degrees of freedom
**e(chi2)** chi-squared
**e(p)** p-value for model test
**e(df_pear)** degrees of freedom for Pearson chi-squared
**e(chi2_dev)** chi-squared test of deviance
**e(chi2_dis)** chi-squared test of deviance dispersion
**e(deviance)** deviance
**e(dispers)** deviance dispersion
**e(phi)** scale parameter
**e(g_min)** smallest group size
**e(g_avg)** average group size
**e(g_max)** largest group size
**e(tol)** target tolerance
**e(dif)** achieved tolerance
**e(rank)** rank of **e(V)**
**e(rc)** return code

Macros
**e(cmd)** **xtgee**
**e(cmdline)** command as typed
**e(depvar)** name of dependent variable
**e(ivar)** variable denoting groups
**e(tvar)** variable denoting time within groups
**e(model)** **pa**
**e(family)** distribution family
**e(link)** link function
**e(corr)** correlation structure
**e(scale)** **x2**, **dev**, **phi**, or *#*; scale parameter
**e(wtype)** weight type
**e(wexp)** weight expression
**e(offset)** linear offset variable
**e(chi2type)** **Wald**; type of model chi-squared test
**e(vce)** *vcetype* specified in **vce()**
**e(vcetype)** title used to label Std. Err.
**e(nmp)** **nmp**, if specified
**e(properties)** **b V**
**e(estat_cmd)** program used to implement **estat**
**e(predict)** program used to implement **predict**
**e(marginsnotok)** predictions disallowed by **margins**
**e(asbalanced)** factor variables **fvset** as **asbalanced**
**e(asobserved)** factor variables **fvset** as **asobserved**

Matrices
**e(b)** coefficient vector
**e(R)** estimated working correlation matrix
**e(V)** variance-covariance matrix of the estimators
**e(V_modelbased)** model-based variance

Functions
**e(sample)** marks estimation sample