## In the spotlight: Interval-censored survival data—model fitting and beyond

### What are interval-censored data?

Survival data often contain censored observations for which time to an event of interest is not observed exactly. Censored observations can be right-censored, left-censored, or interval-censored. An observation is right-censored if we know that the event of interest happened after the observed time. It is left-censored if we know that the event happened before the observed time. It is interval-censored if we know only that the event happened within some observed time interval. The term interval-censored data is used in general to refer to data that might be right-censored, left-censored, or interval-censored.

Interval-censored survival data arise in many areas, including medical, epidemiological, financial, and sociological studies. A common example is a clinical trial where patients are tested or measured periodically to evaluate if the event of interest has happened. We may not observe the exact time of the event, but we know that it happened before an evaluation, after an evaluation, or between two evaluations. The same applies to many other examples, such as unemployment duration in economic data, time of weaning in demographic data, or time to obesity in epidemiological data. Ignoring interval-censoring may lead to biased estimates.

In Stata, we can fit parametric models to interval-censored survival-time data using
the **stintreg** command. **stintreg** supports different
distributions and parameterizations, as well as the modeling of ancillary parameters
and stratification. The command can analyze data that include all types of
censoring, and it can also analyze current status data in which the
event of interest is known to occur only before or after an
observed time.

### Fit a model

We want to study the effect of two breast cancer treatments
(**treat**) on breast retraction, which is a cosmetic deterioration
for some breast cancer patients. Those patients were treated with
either radiotherapy alone or radiotherapy plus adjuvant chemotherapy.
The breast retraction was measured at each follow-up visit to the
doctor, which occured at different times for different patients.
The exact times of breast retraction are not observed, but
they are known to fall in time intervals with the left and right
bounds recorded in variables **ltime** and **rtime**.

We fit a Weibull model of time to breast retraction as a function of treatment using **stintreg**.

.stintreg i.treat, interval(ltime rtime) distribution(weibull)Weibull PH regression Number of obs = 94 Uncensored = 0 Left-censored = 5 Right-censored = 38 Interval-cens. = 51 LR chi2(1) = 10.93 Log likelihood = -143.19228 Prob > chi2 = 0.0009

Haz. Ratio Std. Err. z P>|z| [95% Conf. Interval] | |||

treat | |||

Radio+Chemo | 2.498526 .7069467 3.24 0.001 1.434961 4.350383 | ||

_cons | .0018503 .0013452 -8.66 0.000 .000445 .007693 | ||

/ln_p | .4785787 .1198973 3.99 0.000 .2435843 .713573 | ||

p | 1.613779 .1934877 1.275814 2.041272 | ||

1/p | .6196635 .074296 .4898907 .7838134 | ||

We find that the hazard of breast retraction for patients with radiotherapy plus chemotherapy is 2.5 times larger than the hazard for patients with radiotherapy alone. We can now evaluate whether this model fits well and further explore the results.

### Use diagnostic tools

**stintreg** provides two types of residuals to visually assess the
appropriateness of the fitted models.

We can use **predict** with option **mgale** to obtain the
Martingale-like residuals and to visually check whether the
patient’s age (**age**) should be included in our model by
producing a scatterplot of the Martingale-like residuals versus
**age**.

.predict mg, mgale.scatter mg age

The scatterplot does not show any systematic trend, indicating that
**age** is not needed in the model. We can produce scatterplots of
**mg** against other variables of interest to identify potential
omitted predictors.

To assess the goodness of fit of the model visually, we use
the **estat gofplot** command, which plots the Cox–Snell residuals
versus the estimated cumulative hazard function corresponding to these
residuals. If the model fits the data well, the plotted estimated
cumulative hazards should be close to the reference line, which is
formed by the Cox–Snell residuals.

.estat gofplot, title("Interval-censored Weibull regression")

We can also visually compare our original Weibull model with an exponential model. We fit the model using exponential distribution and obtain the goodness-of-fit plot.

.quietly stintreg i.treat, interval(ltime rtime) distribution(exponential).estat gofplot, title("Interval-censored exponential regression")

Comparing the above two plots produced by **estat gofplot**, we
conclude that the model with Weibull distribution fits the data better
than the model with exponential distribution.

Not having found any evidence against the Weibull model, we refit the Weibull model and see what else it tells us.

### Interpret and visualize results

We have many tools available for interpreting and visualizing results.

We use **predict** to obtain the expected median survival time
for both treatments.
Then, we **tabulate** the results to compare the two types of treatment:

.quietly stintreg i.treat, interval(ltime rtime) distribution(weibull).predict m, median time.tabulate treat, summarize(m)

Summary of Predicted median for | ||

(ltime,rtime] | ||

Treatment | Mean Std. Dev. Freq. | |

Radio | 39.332397 0 46 | |

Radio+Che | 22.300791 0 48 | |

Total | 30.635407 8.5595267 94 |

Expected median time to breast retraction is longer for the radiotherapy-only group than for the group that also received chemotherapy.

We can use **margins** to obtain confidence intervals for those
values:

.margins treat, predict(median time)Adjusted predictions Number of obs = 94 Model VCE : OIM Expression : Predicted median for (ltime,rtime], predict(median time)

Delta-method | ||

Margin Std. Err. z P>|z| [95% Conf. Interval] | ||

treat | ||

Radio | 39.3324 5.342493 7.36 0.000 28.8613 49.80349 | |

Radio+Chemo | 22.30079 2.436642 9.15 0.000 17.52506 27.07652 | |

Next, we compare the average patient’s survival curve
under radiotherapy only (**treat** = 0) and under radiotherapy plus
chemotherapy (**treat** = 1). We can plot the survival functions
for both treatments using the **stcurve** command:

.stcurve, survival at1(treat = 0) at2(treat = 1)

From the above survival function plot, we see that the risk of developing breast retraction for an average patient in the radiotherapy-plus-chemotherapy treatment group is higher than that for the same patient in the radiotherapy-only treatment group. In other words, the adjuvant chemotherapy increases the risk of breast retraction.

Prefer to point and click instead of typing commands? No worries.
All of **stintreg**'s features can also be accessed using Stata's menu and
dialog box.

This example only touches on the types of models and analyses available for interval-censored survival-time data. See [ST] stintreg to learn more.

— Xiao Yang

Senior Statistician and Software Developer