Statalist The Stata Listserver


[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

st: Unobserved heterogeneity in logistic regression


From   "daniel waxman" <[email protected]>
To   <[email protected]>
Subject   st: Unobserved heterogeneity in logistic regression
Date   Mon, 30 Jan 2006 10:43:43 -0500

Maartin Buis directed me to a short paper of his:  "Unobserved heterogeneity
in logistic regression":

http://home.fsw.vu.nl/m.buis/

The concept makes sense--the question is what to do about it.

I am using in-hospital mortality as an outcome in a multivariable logistic
model, focusing on a particular laboratory test (troponin I) as a predictor
(either with simple log transformation, or using -mfp-).    I test
independence by doing nested logistic models with every other mortality
predictor that I can find (some continuous, some dichotomous), and the odds
ratio for the test of interest remains stable (and Hosmer Lemeshow goodness
of fit stats do not reject the models).  My sample sizes are on the order of
10,000-30,000 observations per data set.

The overall mortality is 3.3% and the predictor of interest is strongly
skewed to the left (see below).

My questions are these:

There are of course many unobserved causes for in-hospital mortality, but
insofar as this particular model seems to work, do I need to deal with this?
If one does try to deal with it in a situation such as mine, is it a matter
of using a method other than simple logistic regression to fit the model, or
is it more a matter of assessment of goodness if fit? 

In either case, can anybody point me in the right direction (reference-wise)
toward (1) assessing the degree of unobserved heterogeneity (2) fitting a
model which deals with it if it exists and (3) testing the model?

(I am one of those dangerous physician researchers who has more computing
power than formal statistical training, although I am trying).

Some output follows:

Note that zlog is the log-10 transformation of the predictor of interest
(troponin) with (troponin==0) represented by the dummy variable (zero==1)
zlog replaced
with zero where (troponin==0).  Don't get too caught up in this part--it
works.

************************

*Univariate (calling zlog and zero one predictor)

   . logistic is_dead zlog zero if instudy==1 & bimc==1

   Logistic regression               Number of obs   =      13207
                                     LR chi2(2)      =     146.66
                                     Prob > chi2     =     0.0000
   Log likelihood = -1767.6459       Pseudo R2       =     0.0398

   --------------------------------------------------------------------
    is_dead | Odds Ratio   Std. Err.   z   P>|z|   [95% Conf. Interval]
   ---------+----------------------------------------------------------
       zlog |   1.943987   .1429218  9.04  0.000   1.683113    2.245296
       zero |   .2050451   .0267263 12.16  0.000   .1588184    .2647268
   --------------------------------------------------------------------

   . sum zlog, detail

                               zlog
   ------------------------------------------------------------
         Percentiles      Smallest
    1%           -2       -2.69897
    5%           -2       -2.69897
   10%     -1.69897      -2.522879      Obs               47062
   25%     -1.30103       -2.39794      Sum of Wgt.       47062

   50%      -.30103                     Mean          -.6325129
                           Largest      Std. Dev.      .7529837
   75%            0       1.942504
   90%            0       1.960471      Variance       .5669845
   95%            0        1.96708      Skewness      -.4340483
   99%     .6532125       1.975891      Kurtosis        2.02339



   . estat gof if e(sample),group(10) table

   Logistic model for is_dead, goodness-of-fit test

     (Table collapsed on quantiles of estimated probabilities)
     (There are only 7 distinct quantiles because of ties)
     +---------------------------------------------------------+
     | Group |   Prob | Obs_1 | Exp_1 | Obs_0 |  Exp_0 | Total |
     |-------+--------+-------+-------+-------+--------+-------|
     |     4 | 0.0176 |   105 | 105.0 |  5862 | 5862.0 |  5967 |
     |     5 | 0.0226 |    33 |  29.8 |  1287 | 1290.2 |  1320 |
     |     6 | 0.0275 |    23 |  24.4 |   866 |  864.6 |   889 |
     |     7 | 0.0355 |    43 |  45.6 |  1348 | 1345.4 |  1391 |
     |     8 | 0.0430 |    60 |  50.3 |  1181 | 1190.7 |  1241 |
     |-------+--------+-------+-------+-------+--------+-------|
     |     9 | 0.0553 |    42 |  53.4 |  1042 | 1030.6 |  1084 |
     |    10 | 0.2452 |   108 | 105.5 |  1207 | 1209.5 |  1315 |
     +---------------------------------------------------------+

          number of observations =     13207
                number of groups =         7
         Hosmer-Lemeshow chi2(5) =         5.20
                      Prob > chi2 =         0.3924

*Now, adding age (unstransformed) to the previous model:

     . logistic is_dead zlog zero age if instudy==1 & bimc==1

     Logistic regression                  Number of obs   =      13207
                                          LR chi2(3)      =     254.38
                                          Prob > chi2     =     0.0000
     Log likelihood = -1713.7849          Pseudo R2       =     0.0691

     ------------------------------------------------------- ---------
      is_dead | Odds Ratio Std. Err.    z    P>|z| [95% Conf. Interval]
     ---------+--------------------------------------------- ---------
         zlog |   1.905635 .1413964   8.69   0.000 1.647712   2.203932
         zero |   .2252449 .0295986 -11.34   0.000 .1741012   .2914125
          age |   1.035355  .003652   9.85   0.000 1.028222   1.042538
     ------------------------------------------------------- ---------



   . estat gof if e(sample),group(10) table

   Logistic model for is_dead, goodness-of-fit test

     (Table collapsed on quantiles of estimated probabilities)
     +---------------------------------------------------------+
     | Group |   Prob | Obs_1 | Exp_1 | Obs_0 |  Exp_0 | Total |
     |-------+--------+-------+-------+-------+--------+-------|
     |     1 | 0.0090 |     8 |   9.3 |  1314 | 1312.7 |  1322 |
     |     2 | 0.0121 |    16 |  13.9 |  1304 | 1306.1 |  1320 |
     |     3 | 0.0155 |    22 |  18.2 |  1299 | 1302.8 |  1321 |
     |     4 | 0.0192 |    26 |  22.8 |  1294 | 1297.2 |  1320 |
     |     5 | 0.0232 |    24 |  28.0 |  1297 | 1293.0 |  1321 |
     |-------+--------+-------+-------+-------+--------+-------|
     |     6 | 0.0281 |    40 |  33.9 |  1281 | 1287.1 |  1321 |
     |     7 | 0.0340 |    35 |  40.8 |  1285 | 1279.2 |  1320 |
     |     8 | 0.0434 |    45 |  50.4 |  1276 | 1270.6 |  1321 |
     |     9 | 0.0641 |    70 |  69.1 |  1251 | 1251.9 |  1321 |
     |    10 | 0.4050 |   128 | 127.5 |  1192 | 1192.5 |  1320 |
     +---------------------------------------------------------+

          number of observations =     13207
                number of groups =        10
         Hosmer-Lemeshow chi2(8) =         4.91
                     Prob > chi2 =         0.7672


*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2024 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index