Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: QUERY: unrealistic results with stcox (Multiple-record-per-subject survival data)


From   Steve Samuels <[email protected]>
To   [email protected]
Subject   Re: st: QUERY: unrealistic results with stcox (Multiple-record-per-subject survival data)
Date   Wed, 17 Jul 2013 19:19:30 -0500

See
http://www.stata.com/support/faqs/statistics/stcox-producing-missing-standard-errors/

"4) Covariate does not vary within death event risk sets."

When you subset to those entering in 2009, year does not vary within
risk sets.


Steve
[email protected]


On Jul 17, 2013, at 4:31 AM, Alessandro Marcon wrote:



Dear All,

I have repeated cross-sectional (panel) data where "id_pz" is the
patient's unique id, "year" ranges 2009 to 2012, the event of interest
is "decesso_", which stands for death.
Time entering/exiting the study are "t_enter" and "t_exit", respectively.

> 	 +--------------------------------------------+
>       | id_pz   year   decesso_   t_enter   t_exit |
>       |--------------------------------------------|
> 1249. |   388   2009          0     17898    18262 |
> 1250. |   388   2010          0     18263    18627 |
> 1251. |   388   2011          0     18628    18992 |
> 1252. |   388   2012          1     18993    19152 |
>       |--------------------------------------------|
> 1253. |   389   2009          0     17898    18262 |
> 1254. |   389   2010          1     18263    18546 |
>       |--------------------------------------------|
> 1255. |   390   2012          0     18993    19358 |
>       |--------------------------------------------|
> 1256. |   391   2009          0     17898    18262 |
> 1257. |   391   2010          0     18263    18627 |
> 1258. |   391   2011          0     18628    18992 |
> 1259. |   391   2012          0     18993    19358 |
>       |--------------------------------------------|
> 1260. |   392   2009          0     17898    18262 |
> 1261. |   392   2010          0     18263    18627 |
> 1262. |   392   2011          0     18628    18992 |
> 1263. |   392   2012          0     18993    19358 |
>       |--------------------------------------------|

Patients can enter one or more of 4 years of observation (2009-2012)
like this:

> .xtdescribe
> 
>   id_pz:  1, 2, ..., 10998                                  n =
> 10998
>    year:  2009, 2010, ..., 2012                             T
> =          4
>           Delta(year) = 1 unit
>           Span(year)  = 4 periods
>           (id_pz*year uniquely identifies each observation)
> 
> Distribution of T_i:   min      5%     25% 50%       75%     95%     max
>                         1       1       2 4         4       4       4
> 
>     Freq.  Percent    Cum. |  Pattern
> ---------------------------+---------
>     6809     61.91   61.91 |  1111
>     1017      9.25   71.16 |  ...1
>      810      7.36   78.52 |  ..11
>      520      4.73   83.25 |  .111
>      432      3.93   87.18 |  111.
>      428      3.89   91.07 |  1...
>      361      3.28   94.35 |  11..
>      296      2.69   97.04 |  ..1.
>      183      1.66   98.71 |  .1..
>      142      1.29  100.00 | (other patterns)
> ---------------------------+---------
>    10998    100.00         |  XXXX


I want to analyse survival of this dynamic cohort. Since I have
"Multiple-record-per-subject survival data", I stset my data like this:

> stset t_exit,  id(id_pz) failure(decesso_==1) origin(time t_enter)
> scale(365)

This is what I get when computing annual rates and testing by Cox model:

> . strate year, per(1000)
> 
>         failure _d:  decesso_ == 1
>   analysis time _t:  (t_exit-origin)/365
>             origin:  time t_enter
>                 id:  id_pz
> 
> Estimated rates (per 1000) and lower/upper bounds of 95% confidence
> intervals
> (34673 records included in the analysis)
> 
>  +------------------------------------------------+
>  | year     D        Y     Rate    Lower    Upper |
>  |------------------------------------------------|
>  | 2009   219   7.9698   27.479   24.070   31.370 |
>  | 2010   240   8.2815   28.980   25.536   32.889 |
>  | 2011   281   8.8487   31.756   28.252   35.695 |
>  | 2012   275   9.1627   30.013   26.667   33.778 |
>  +------------------------------------------------+
> 
> 
> . stcox i.year
> 
>         failure _d:  decesso_ == 1
>   analysis time _t:  (t_exit-origin)/365
>             origin:  time t_enter
>                 id:  id_pz
> 
> Iteration 0:   log likelihood =  -9195.495
> Iteration 1:   log likelihood = -9194.8578
> Iteration 2:   log likelihood = -9194.8575
> Iteration 3:   log likelihood = -9194.8575
> Refining estimates:
> Iteration 0:   log likelihood = -9194.8575
> 
> Cox regression -- Breslow method for ties
> 
> No. of subjects =        10998                     Number of obs
> =     34673
> No. of failures =         1015
> Time at risk    =  34262.67945
>                                                   LR chi2(3)
> =      1.28
> Log likelihood  =   -9194.8575                     Prob > chi2
> =    0.7350
> 
> ------------------------------------------------------------------------------
>          _t | Haz. Ratio   Std. Err.      z P>|z|     [95% Conf.
> Interval]
> -------------+----------------------------------------------------------------
>        year |
>       2010  |   1.160537   .1907743     0.91   0.365 .8408812    1.601708
>       2011  |   .9716829    .153747    -0.18   0.856 .7125922    1.324976
>       2012  |   1.028494   .1607485     0.18   0.857 .7571171    1.397141
> ------------------------------------------------------------------------------



HOWEVER, when I repeat this analyses only including patients who enter
the study in 2009 (baseline==1) this is what I get!

> 
> . stset t_exit if baseline==1,  id(id_pz) failure(decesso_==1)
> origin(time t_enter) scale(365)
> 
>                id:  id_pz
>     failure event:  decesso_ == 1
> obs. time interval:  (t_exit[_n-1], t_exit]
> exit on or before:  failure
>    t for analysis:  (time-origin)/365
>            origin:  time t_enter
>            if exp:  baseline==1
> 
> ------------------------------------------------------------------------------
>    34673  total obs.
>     4848  ignored at outset because of -if <exp>-
> ------------------------------------------------------------------------------
>    29825  obs. remaining, representing
>     8086  subjects
>      879  failures in single failure-per-subject data
> 29477.81  total analysis time at risk, at risk from t =         0
>                             earliest observed entry t =         0
>                                  last observed exit t =         4
> 
> . strate year , per(1000)
> 
>         failure _d:  decesso_ == 1
>   analysis time _t:  (t_exit-origin)/365
>             origin:  time t_enter
>                 id:  id_pz
> 
> Estimated rates (per 1000) and lower/upper bounds of 95% confidence
> intervals
> (29825 records included in the analysis)
> 
>  +------------------------------------------------+
>  | year     D        Y     Rate    Lower    Upper |
>  |------------------------------------------------|
>  | 2009   219   7.9698   27.479   24.070   31.370 |
>  | 2010   212   7.5069   28.241   24.684   32.310 |
>  | 2011   242   7.1824   33.694   29.705   38.218 |
>  | 2012   206   6.8188   30.211   26.355   34.631 |
>  +------------------------------------------------+
> 
> 
> . stcox i.year
> 
>         failure _d:  decesso_ == 1
>   analysis time _t:  (t_exit-origin)/365
>             origin:  time t_enter
>                 id:  id_pz
> 
> Iteration 0:   log likelihood = -7824.2973
> Iteration 1:   log likelihood = -7822.9333
> Iteration 2:   log likelihood = -7822.3438
> Iteration 3:   log likelihood = -7822.0976
> Iteration 4:   log likelihood =      -7822
> Iteration 5:   log likelihood = -7821.9629
> Iteration 6:   log likelihood =  -7821.949
> [omitted]
> Iteration 26:  log likelihood = -7821.7751  (backed up)
> Refining estimates:
> Iteration 0:   log likelihood = -7821.9409
> Iteration 1:   log likelihood = -7821.9409
> Iteration 2:   log likelihood = -7821.9408
> [omitted]
> Iteration 19:  log likelihood = -7821.3159  (backed up)
> 
> Cox regression -- Breslow method for ties
> 
> No. of subjects =         8086                     Number of obs
> =     29825
> No. of failures =          879
> Time at risk    =   29477.8137
>                                                   LR chi2(2)
> =      5.96
> Log likelihood  =   -7821.3159                     Prob > chi2
> =    0.0507
> 
> ------------------------------------------------------------------------------
>          _t | Haz. Ratio   Std. Err.      z P>|z|     [95% Conf.
> Interval]
> -------------+----------------------------------------------------------------
>        year |
>       2010  |   1.81e+24   1.01e+30     0.00 1.000
> 0           .
>       2011  |   1.26e+13   4.58e+18     0.00 1.000
> 0           .
>       2012  |   73.05721          .        . .            .           .
> ------------------------------------------------------------------------------
> 

The number of events and person-time in 2009 seem to be big enough. The
rates are still reasonable but COX MODEL SEEMS TO FAIL.
Any help will be very much welcome!

Best regards,
Alessandro Marcon


-- 

Alessandro Marcon, PhD
Unit of Epidemiology &   Medical Statistics
Department of Public Health and Community Medicine
University of Verona
Strada Le Grazie 8, 37134 Verona, Italy
tel. +39 045 8027668 fax +39 045 8027154





*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/


*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index