Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

# st: very different estimates with xtgee vs. xtlogit

 From Alessandro Marcon <[email protected]> To [email protected] Subject st: very different estimates with xtgee vs. xtlogit Date Wed, 12 Jun 2013 14:31:58 +0200

```Dear all,

```
briefly, my query is: is it reasonable to have VERY DIFFERENT results when running "xtlogit, re" and when running "xtgee" (or equivalent "xtlogit, pa")?
```
This is the whole story, very detailed:

```
I have data (in the "long" form) of patients (personal identifier = id_pz) relating to 4 years (year = 2009 to 2012). The outcome of the analysis (cad_) is binary (1=patient was seen by specialist in year X, 0=otherwise). Adjustment variables are sex and mean-centered age (age_c), and my interest is in computing prevalences of the outcome in years 2009 to 2012. Most of the patients have 4 observations, but some enter only one or some of the 4 years of observation. In other words, the panel data are not balanced:
```
. xtset
panel variable: id_pz (unbalanced)
time variable: year, 2009 to 2012, but with gaps
delta: 1 unit

. xtdescribe

id_pz: 1, 2, ..., 10459 n = 10459
year: 2009, 2010, ..., 2012 T = 4
Delta(year) = 1 unit
Span(year) = 4 periods
(id_pz*year uniquely identifies each observation)

Distribution of T_i: min 5% 25% 50% 75% 95% max
1 1 2 4 4 4 4

Freq. Percent Cum. | Pattern
---------------------------+---------
6597 63.07 63.07 | 1111
911 8.71 71.79 | ...1
656 6.27 78.06 | ..11
635 6.07 84.13 | .111
448 4.28 88.41 | 1...
346 3.31 91.72 | 11..
260 2.49 94.21 | ..1.
251 2.40 96.61 | 111.
197 1.88 98.49 | .1..
158 1.51 100.00 | (other patterns)
---------------------------+---------
10459 100.00 | XXXX

These are the crude prevalences of the outcome per year:

. tabstat cad_, by(year) f(%9.3f) not

by categories of: year

year | mean
---------+----------
2009 | 0.251
2010 | 0.255
2011 | 0.349
2012 | 0.413
--------------------

Since I am interested in marginal estimates, I have chosen GEEs (xtgee).

```
. xtgee cad_ i.year sex age_c, i(id_pz) family(binom) link(logit) corr(exch) nolog eform
```
GEE population-averaged model Number of obs = 33220
Group variable: id_pz Number of groups = 10459
Link: logit Obs per group: min = 1
Family: binomial avg = 3.2
Correlation: exchangeable max = 4
Wald chi2(5) = 1759.46
Scale parameter: 1 Prob > chi2 = 0.0000

------------------------------------------------------------------------------
cad_ | Odds Ratio Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
year |
2010 | 1.054325 .024332 2.29 0.022 1.007698 1.10311
2011 | 1.696386 .0380047 23.59 0.000 1.62351 1.772534
2012 | 2.185553 .0491377 34.78 0.000 2.091336 2.284014
|
sex | 1.061284 .0398101 1.59 0.113 .9860566 1.14225
age_c | .9806823 .0013765 -13.90 0.000 .977988 .983384
_cons | .3045342 .0095296 -38.00 0.000 .2864177 .3237966
------------------------------------------------------------------------------

. margins, at(sex=0.5 age=0) by(year) noatlegend

Predictive margins Number of obs = 33220
Model VCE : Conventional

Expression : Pr(cad_ != 0), predict()
over : year

------------------------------------------------------------------------------
| Delta-method
| Margin Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
year |
2009 | .2388068 .0046137 51.76 0.000 .2297642 .2478494
2010 | .2485555 .0045971 54.07 0.000 .2395453 .2575658
2011 | .3473446 .0049971 69.51 0.000 .3375504 .3571387
2012 | .4067629 .0050812 80.05 0.000 .3968039 .416722
------------------------------------------------------------------------------

```
Prevalences predicted after GEEs are similar to crude prevalences (above). This sounds reasonable because sex and age are not confounders (very similar distribution across the 4 years). The GEEs simply corrected for the correlation structure.
```

```
Because I also wanted to investigate variance at the patient-level, I tried to run a mixed model (xtlogit, re: with random intercept at the patient's level).
```
. xtlogit cad_ i.year sex age_c, i(id_pz) nolog or

Random-effects logistic regression Number of obs = 33220
Group variable: id_pz Number of groups = 10459

Random effects u_i ~ Gaussian Obs per group: min = 1
avg = 3.2
max = 4

Wald chi2(5) = 1409.10
Log likelihood = -14991.227 Prob > chi2 = 0.0000

------------------------------------------------------------------------------
cad_ | OR Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
year |
2010 | 1.150102 .0713165 2.26 0.024 1.018484 1.298728
2011 | 4.224007 .2668206 22.81 0.000 3.732126 4.780717
2012 | 8.414356 .5562698 32.22 0.000 7.391768 9.578412
|
sex | 1.126777 .1123615 1.20 0.231 .9267378 1.369996
age_c | .9480372 .0036815 -13.74 0.000 .940849 .9552802
_cons | .0361745 .0032931 -36.46 0.000 .0302631 .0432406
-------------+----------------------------------------------------------------
/lnsig2u | 2.90168 .0398126 2.823649 2.979712
-------------+----------------------------------------------------------------
sigma_u | 4.266698 .0849342 4.103435 4.436456
rho | .8469443 .0051609 .836553 .856788
------------------------------------------------------------------------------
```
Likelihood-ratio test of rho=0: chibar2(01) = 1.1e+04 Prob >= chibar2 = 0.000
```
. margins, at(sex=0.5 age=0) by(year) noatlegend predict(pu0)

Predictive margins Number of obs = 33220
Model VCE : OIM

Expression : Pr(cad_=1 assuming u_i=0), predict(pu0)
over : year

------------------------------------------------------------------------------
| Delta-method
| Margin Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
year |
2009 | .0369792 .0027249 13.57 0.000 .0316386 .0423198
2010 | .0422951 .0030205 14.00 0.000 .036375 .0482151
2011 | .1395616 .0076728 18.19 0.000 .1245233 .1546
2012 | .2442016 .0112626 21.68 0.000 .2221272 .2662759
------------------------------------------------------------------------------

```
xtlogit provided very different prevalences (as well as ORs). The adjusted prevalences are somewhat unreasonable if compared to the crude ones.
```

```
My query is: what is wrong here? I have read somewhere that mixed models with a random intercept are equivalent to GEEs with exchangeable correlation…
```
Any comment, suggestion etc will be very much appreciated.

Best regards,
Alessandro Marcon

--

Alessandro Marcon, PhD
Unit of Epidemiology &   Medical Statistics
Department of Public Health and Community Medicine
University of Verona
Strada Le Grazie 8, 37134 Verona, Italy
tel. +39 045 8027668 fax +39 045 8027154

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/
```