I'm exploring some survival data.

In this study, all subjects enter at time zero and are
potentially followed for 30 weeks, being examined once per week.

Some subjects drop out and are censored. For example, a subject
may appear and be examined in the 8th week but not appear
for examination in the 9th (and subsequent) weeks. I would mark
this subject as censored in the 9th week, as this is when we have
no knowledge of the subject's "state". In Stata -stset- 
terminology, the record for this subject reads:

     _t0 = 0, _t = 9, _d = 0, _st = 1

That is, the subject entered the study at time _t0 = 0 and exited 
the study at time _t = 9 with status _d = 0 (indicating censored).
(The last -stset- value, _st = 1, means the record is to be used.)

If the subject experiences the event of interest (a "failure" in
survival terminology), _t is set to the week the failure was observed
and _d is set to 1, indicating a failure; the subject is no longer
followed. So if the subject is observed as a failure in week 11, 
the -stset- record reads:

    _t0 = 0, _t = 11, _d = 1, _st = 1

where _d = 1 indicates failure.

The last possible record type is a special case of the first one...

If a subject appears for all 30 weeks and is not observed to fail,
the subject must be censored; my question is whether this censoring
Should be marked in the 30th week or in the (unobserved) 31st week?

Is the record:

    _t0 = 0, _t = 30, _d = 0, _st = 1


    _t0 = 0, _t = 31, _d = 0, _st = 1

This choice affects the calculated statistics so is of importance.

[Clearly, if I ask this question, I also must ask whether I'm marking
censoring during the study correctly. Was the first example (above)
censored at week 9 or week 8?]

These question arose, in part, because I used -stci- to compute
the (restricted) mean survival time (via -stci , rmean-) for a 
treatment group in which no failures occurred. I had set
_t = 31 for subjects censored at the end of the study and observed 
that the (restricted) mean survival time was 31 weeks... I had 
expected 30 weeks.

Changing to _t = 30 gets a mean survival time of 30 weeks. So, which
is correct?

As a related note, the median survival time (via -stci-) was
undefined. My initial thought is that since all subjects survived
the full 30 weeks, the median should have been 30. I suspect the
undefined result occurs because the Kaplan-Meier product-limit
estimate of the survival function (the underlying source for these 
percentiles) is undefined... But I'm not convinced it should be. 

Any and all opinions on these topics will be appreciated.


Thomas J. Steichen

