Home  /  Stata News  /  Vol 33 No 1  /  stintreg
The Stata News

«Back to main page

In the spotlight: Interval-censored survival data—model fitting and beyond

What are interval-censored data?

Survival data often contain censored observations for which time to an event of interest is not observed exactly. Censored observations can be right-censored, left-censored, or interval-censored. An observation is right-censored if we know that the event of interest happened after the observed time. It is left-censored if we know that the event happened before the observed time. It is interval-censored if we know only that the event happened within some observed time interval. The term interval-censored data is used in general to refer to data that might be right-censored, left-censored, or interval-censored.

Interval-censored survival data arise in many areas, including medical, epidemiological, financial, and sociological studies. A common example is a clinical trial where patients are tested or measured periodically to evaluate if the event of interest has happened. We may not observe the exact time of the event, but we know that it happened before an evaluation, after an evaluation, or between two evaluations. The same applies to many other examples, such as unemployment duration in economic data, time of weaning in demographic data, or time to obesity in epidemiological data. Ignoring interval-censoring may lead to biased estimates.

In Stata, we can fit parametric models to interval-censored survival-time data using the stintreg command. stintreg supports different distributions and parameterizations, as well as the modeling of ancillary parameters and stratification. The command can analyze data that include all types of censoring, and it can also analyze current status data in which the event of interest is known to occur only before or after an observed time.

Fit a model

We want to study the effect of two breast cancer treatments (treat) on breast retraction, which is a cosmetic deterioration for some breast cancer patients. Those patients were treated with either radiotherapy alone or radiotherapy plus adjuvant chemotherapy. The breast retraction was measured at each follow-up visit to the doctor, which occured at different times for different patients. The exact times of breast retraction are not observed, but they are known to fall in time intervals with the left and right bounds recorded in variables ltime and rtime.

We fit a Weibull model of time to breast retraction as a function of treatment using stintreg.

. stintreg i.treat, interval(ltime rtime) distribution(weibull)


Weibull PH regression                           Number of obs     =         94
                                                   Uncensored     =          0
                                                   Left-censored  =          5
                                                   Right-censored =         38
                                                   Interval-cens. =         51

                                                LR chi2(1)        =      10.93
Log likelihood = -143.19228                     Prob > chi2       =     0.0009

Haz. Ratio Std. Err. z P>|z| [95% Conf. Interval]
treat
Radio+Chemo 2.498526 .7069467 3.24 0.001 1.434961 4.350383
_cons .0018503 .0013452 -8.66 0.000 .000445 .007693
/ln_p .4785787 .1198973 3.99 0.000 .2435843 .713573
p 1.613779 .1934877 1.275814 2.041272
1/p .6196635 .074296 .4898907 .7838134
Note: Estimates are transformed only in the first equation. Note: _cons estimates baseline hazard.

We find that the hazard of breast retraction for patients with radiotherapy plus chemotherapy is 2.5 times larger than the hazard for patients with radiotherapy alone. We can now evaluate whether this model fits well and further explore the results.

Use diagnostic tools

stintreg provides two types of residuals to visually assess the appropriateness of the fitted models.

We can use predict with option mgale to obtain the Martingale-like residuals and to visually check whether the patient’s age (age) should be included in our model by producing a scatterplot of the Martingale-like residuals versus age.

. predict mg, mgale

. scatter mg age

Graph of Martingale-like residual versus age

The scatterplot does not show any systematic trend, indicating that age is not needed in the model. We can produce scatterplots of mg against other variables of interest to identify potential omitted predictors.

To assess the goodness of fit of the model visually, we use the estat gofplot command, which plots the Cox–Snell residuals versus the estimated cumulative hazard function corresponding to these residuals. If the model fits the data well, the plotted estimated cumulative hazards should be close to the reference line, which is formed by the Cox–Snell residuals.

. estat gofplot, title("Interval-censored Weibull regression")
Graph of cumulative hazard versus Cox-Snell residuals

We can also visually compare our original Weibull model with an exponential model. We fit the model using exponential distribution and obtain the goodness-of-fit plot.

. quietly stintreg i.treat, interval(ltime rtime) distribution(exponential)

. estat gofplot, title("Interval-censored exponential regression")

Graph of Interval-censored exponential regression: Cumulative hazard versus Cox-Snell residuals

Comparing the above two plots produced by estat gofplot, we conclude that the model with Weibull distribution fits the data better than the model with exponential distribution.

Not having found any evidence against the Weibull model, we refit the Weibull model and see what else it tells us.

Interpret and visualize results

We have many tools available for interpreting and visualizing results.

We use predict to obtain the expected median survival time for both treatments. Then, we tabulate the results to compare the two types of treatment:

. quietly stintreg i.treat, interval(ltime rtime) distribution(weibull)

. predict m, median time

. tabulate treat, summarize(m)

Summary of Predicted median for
(ltime,rtime]
Treatment Mean Std. Dev. Freq.
Radio 39.332397 0 46
Radio+Che 22.300791 0 48
Total 30.635407 8.5595267 94

Expected median time to breast retraction is longer for the radiotherapy-only group than for the group that also received chemotherapy.

We can use margins to obtain confidence intervals for those values:

. margins treat, predict(median time)

Adjusted predictions                            Number of obs     =         94
Model VCE    : OIM

Expression   : Predicted median for (ltime,rtime], predict(median time)

Delta-method
Margin Std. Err. z P>|z| [95% Conf. Interval]
treat
Radio 39.3324 5.342493 7.36 0.000 28.8613 49.80349
Radio+Chemo 22.30079 2.436642 9.15 0.000 17.52506 27.07652

Next, we compare the average patient’s survival curve under radiotherapy only (treat = 0) and under radiotherapy plus chemotherapy (treat = 1). We can plot the survival functions for both treatments using the stcurve command:

. stcurve, survival at1(treat = 0) at2(treat = 1)

Graph of interval-censored Weibull PH regression: survival time versus analysis time

From the above survival function plot, we see that the risk of developing breast retraction for an average patient in the radiotherapy-plus-chemotherapy treatment group is higher than that for the same patient in the radiotherapy-only treatment group. In other words, the adjuvant chemotherapy increases the risk of breast retraction.

Prefer to point and click instead of typing commands? No worries. All of stintreg's features can also be accessed using Stata's menu and dialog box.

This example only touches on the types of models and analyses available for interval-censored survival-time data. See [ST] stintreg to learn more.

— Xiao Yang
Senior Statistician and Software Developer

«Back to main page