Statalist The Stata Listserver


[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

st: Set up multiple failure data with interval censoring


From   "Benigno Rodriguez G., MD" <rodriguez.benigno@clevelandactu.org>
To   statalist@hsphsun2.harvard.edu
Subject   st: Set up multiple failure data with interval censoring
Date   Wed, 07 Feb 2007 21:56:57 -0500

Hi, all:

I have a dataset where subjects were seen at the time of an intervention and 9 times thereafter. The failure event is a cell count greater than 200 after the intervention. All subjects had a cell count below 200 at baseline (i.e., no left censoring). Some covariates include baseline cell count (w0 below) and a dichotomous "region" variable. Failure can occur one time, multiple times, or not at all for each individual during follow up. The question is whether region is associated with time to failure, and secondarily, estimating overall time spent with a cell count over 200. Time is measured in weeks.

I found the article by Mario Cleves (STB-49, ssa13) incredibly useful, and to my mind, the visits in these subjects are closely spaced enough that I would feel comfortable treating time as continuous. But one feature of the data that I think makes it necessary to treat is as interval censored is the fact that an individual is at risk only while having a cell count below 200, and this can happen intermittently during follow up.

The data look like this:

id region w0 w2 w4 w8 w12 w16 w24 w32 w40 w48
1 2 96 213 211 275 207 295 275 388 452 349
2 1 113 355 302 251 254 230 167 162 150 108
3 2 125 138 146 166 113 131 134 146 146 249
4 1 126 291 282 339 409 330 198 341 260 201
5 1 88 197 229 186 163 257 204 245 308

Replacing the above counts with just a status indicator:

id region w0 w2 w4 w8 w12 w16 w24 w32 w40 w48
1 2 96 1 1 1 1 1 1 1 1 1
2 1 113 1 1 1 1 1 0 0 0 0
3 2 125 0 0 0 0 0 0 0 0 1
4 1 126 1 1 1 1 1 0 1 1 1
5 1 88 0 1 0 0 1 1 1 1

Several features of the data make it unclear how to set up the dataset: (a) id=1 has the event at each time point and is therefore not at risk after week 2; (b) id=2 only becomes at risk at week 24; (c) id=3 only fails on the day of the last observation; (d) id=4 becomes at risk at week 24, but then again is no longer at risk at week 34 AND fails at the end of follow up; (e) id=5 has a missing observation at week 16.

My questions are: 1) Which of the approaches nicely reviewed by Cleves would be recommended here, if any? (and if none, can you suggest an alternative approach); and 2) Could anybody suggest how to set up the data to account for the above peculiarities of these records?

Thanks,



BENIGNO RODRIGUEZ G., MD
Assistant Professor of Medicine
Case Western Reserve University
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/




© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index