Interval-censored Cox model

Order Stata

Watch video demo

<- See Stata's other features

Highlights

Genuine semiparametric Cox proportional hazards modeling
Left-censoring, right-censoring, interval-censoring
Single-record and multiple-record data format
Acommodation with time-varying covariates
Stratified estimation
Four methods to estimate standard errors, including robust and cluster–robust standard errors
Testing proportional-hazards assumption
Predictions of baseline functions, Martingale-like residuals, Cox–Snell-like residuals, and time-varying prediction
Graphs of survivor, cumulative hazard, and hazard functions
Graphical checks of proportional-hazards assumption

What is the exact time of cancer recurrence? Or time of COVID-19 infection? We don't really know exactly. All we know is that the cancer recurred sometime between two follow-up examinations and that the infection occurred sometime before the onset of symptoms or a positive COVID test. Data like these are called interval-censored time-to-event data. By time-to-event or event-time data, we also mean failure-time data, duration data, and so on. They can be stored in two different formats: single-record-per-subject format and multiple-record-per-subject format.

Interval-censored event-time data arise in many areas, including medical, epidemiological, economic, financial, and sociological studies. Ignoring interval-censoring will often lead to biased estimates.

A semiparametric Cox proportional hazards regression model is used routinely to analyze uncensored and right-censored event-time data. In Stata, you can use the estimation command stintcox to fit the Cox model to interval-censored event-time data, for both single-record and multiple-record formats.

Just as with right-censored data, a Cox model is appealing for interval-censored data because it uses semiparametric estimation and thus does not require any parametric assumptions about the baseline hazard function. Also for low-event rates, the exponentiated regression parameters approximate the log relative-risks.

Semiparametric estimation of interval-censored event-time data is challenging because none of the event times are observed exactly. Thus, "semiparametric" modeling of these data often resorted to using spline methods or piecewise-exponential models for the baseline hazard function. Genuine semiparametric modeling of interval-censored event-time data was not available until recent methodological advances, which are implemented in the stintcox command.

Let's see it work

Fitting a model for single-record-per-subject interval-censored data
Stratification
Checking proportional-hazards assumption
Fitting a model for multiple-record-per-subject interval-censored data
Graphing survivor functions
Additional resources

Fitting a model for single-record-per-subject interval-censored data

We use data from Zeng et al. (2016) that studies time to HIV infection in a cohort study of injecting drug users in Thailand. The dataset contains 1,124 subjects. Those subjects initially tested negative for the HIV virus. They were then followed and assessed for HIV-1 seropositivity through blood tests approximately every four months. Because the subjects were tested periodically, the exact time of HIV infection was not observed, but the times are known to fall in intervals between blood tests with the lower and upper endpoints recorded in variables ltime and rtime.

The dataset also records several baseline factors, such as centered age at recruitment (age_mean), sex (male), history of needle sharing (needle), history of drug injection (inject), and whether a subject has been in jail at the time of recruitment (jail). The dataset contains one record per subject with the event-time information recorded as interval data and is called single-record-per-subject data. Here is a subset of the dataset for subjects 271–274:

. use https://www.stata-press.com/data/r18/idu
(Modified Bangkok IDU Preparatory Study)

. format ltime rtime age_mean %6.2f

. list ltime rtime age_mean male needle inject jail if _n >=271
     & _n<=274



       ltime   rtime   age_mean   male   needle   inject   jail 

 271.  22.00       .      -6.46    Yes      Yes       No     No 
 272.   3.80    9.41       8.54     No       No       No    Yes 
 273.  20.66       .     -11.46    Yes      Yes       No     No 
 274.   0.00    3.87      -4.46    Yes      Yes      Yes    Yes

We can fit a Cox proportional hazards model in which time to HIV infection depends on above factors of interest. In a single-record-per-subject format, we must specify option interval() with stintcox.

. stintcox age_mean i.male i.needle i.inject i.jail, interval(ltime rtime)
note: using adaptive step size to compute derivatives.

Performing EM optimization (showing every 100 iterations):
Iteration 0:    Log likelihood = -1086.2564
Iteration 100:  Log likelihood = -597.65634
Iteration 200:  Log likelihood = -597.57555
Iteration 295:  Log likelihood = -597.56443

Computing standard errors: ............................ done

Interval-censored Cox regression                    Number of obs     =  1,124
Baseline hazard: Reduced intervals                         Uncensored =      0
                                                        Left-censored =     41
                                                       Right-censored =    991
                                                       Interval-cens. =     92

                                                    Wald chi2(5)      =  17.10
Log likelihood = -597.56443                         Prob > chi2       = 0.0043



                              OPG                                             
               Haz. ratio   std. err.      z    P>|z|     [95% conf. interval]

    age_mean     .9684341   .0126552    -2.45   0.014     .9439452    .9935582
              
        male  
        Yes      .6846949   .1855907    -1.40   0.162     .4025073    1.164717
              
      needle  
        Yes      1.275912   .2279038     1.36   0.173     .8990401    1.810768
              
      inject  
        Yes      1.250154   .2414221     1.16   0.248     .8562184    1.825334
              
        jail  
        Yes      1.567244   .3473972     2.03   0.043     1.014982    2.419998

Note: Standard-error estimates may be more variable for small datasets and
      datasets with low proportions of interval-censored observations.

We find that age at the recruitment is significantly associated with lower risk of HIV infection and being in jail at enrollment is associated with higher risk of HIV infection. Other factors are not statistically significant.

Stratification

If we assume that the baseline hazard function for female is different from that for male, we can fit a stratified Cox proportional hazards model by using the strata() option:

. stintcox age_mean i.needle i.inject i.jail, interval(ltime rtime) strata(male)
note: using adaptive step size to compute derivatives.

Performing EM optimization (showing every 100 iterations):
Iteration 0:   Log likelihood = -1087.0536
Iteration 100: Log likelihood = -585.59848
Iteration 200: Log likelihood = -585.53143
Iteration 282: Log likelihood =  -585.5222

Computing standard errors: ........................... done

Stratified interval-censored Cox regression         Number of obs     =  1,124
Baseline hazard: Reduced intervals                         Uncensored =      0
Strata variable: male                                   Left-censored =     41
                                                       Right-censored =    991
Event-time interval:                                   Interval-cens. =     92
  Lower endpoint: ltime
  Upper endpoint: rtime
                                                    Wald chi2(4)      =  14.84
Log likelihood = -585.5222                          Prob > chi2       = 0.0051



                              OPG                                             
               Haz. ratio   std. err.      z    P>|z|     [95% conf. interval]

    age_mean     .9682508   .0126326    -2.47   0.013     .9438052    .9933295
              
      needle  
        Yes      1.276222   .2270302     1.37   0.170     .9005422    1.808625
              
      inject  
        Yes      1.245357   .2393768     1.14   0.254     .8544367    1.815131
              
        jail  
        Yes       1.57314   .3490687     2.04   0.041     1.018337    2.430205

Note: Standard error estimates may be more variable for small datasets and
      datasets with low proportions of interval-censored observations.

Our conclusion here is similar to the case without stratification.

Checking proportional hazards assumption

It is important to evaluate the validity of the underlying assumption for the Cox proportional hazards model, which is that the hazard ratio is constant over time. One way of testing the proportional-hazards assumption for a covariate is to test whether the coefficient associated with that covariate is time invariant. This can be accomplished by including an interaction between this covariate and a function of time in the model and testing whether the corresponding coefficient equals zero.

For example, to check the proportional-hazards assumption for inject, we can include an interaction term between inject and time using the tvc() option:

. stintcox age_mean i.male i.needle i.inject i.jail, interval(ltime rtime)
     tvc(i.inject) nohr
note: using adaptive step size to compute derivatives.

Performing EM optimization (showing every 100 iterations):
Iteration 0:   Log likelihood = -1086.2564
Iteration 100: Log likelihood = -597.58166
Iteration 200: Log likelihood = -597.50007
Iteration 295: Log likelihood = -597.48895

Computing standard errors: ......................................... done

Interval-censored Cox regression                    Number of obs     =  1,124
Baseline hazard: Reduced intervals                         Uncensored =      0
                                                        Left-censored =     41
Event-time interval:                                   Right-censored =    991
  Lower endpoint: ltime                                Interval-cens. =     92
  Upper endpoint: rtime
                                                    Wald chi2(6)      =  17.36
Log likelihood = -597.48895                         Prob > chi2       = 0.0080



                              OPG                                             
               Coefficient  std. err.      z    P>|z|     [95% conf. interval]

main                                                                          
    age_mean    -.0321889   .0131026    -2.46   0.014    -.0578696   -.0065082
              
        male  
        Yes     -.3817124   .2714232    -1.41   0.160    -.9136921    .1502674
              
      needle  
        Yes      .2434583   .1791214     1.36   0.174    -.1076131    .5945298
              
      inject  
        Yes      .3194498   .3209978     1.00   0.320    -.3096943    .9485939
              
        jail  
        Yes      .4498814   .2230002     2.02   0.044      .012809    .8869538

tvc                                                                           
      inject  
        Yes     -.0080255   .0208446    -0.39   0.700    -.0488801    .0328292

Notes: Standard error estimates may be more variable for small datasets and
       datasets with low proportions of interval-censored observations.
       Variables in tvc equation interacted with _t.

Wald test that [tvc] = 0: chi2(1) = 0.1482                Prob > chi2 = 0.7002

With the nohr option, the first equation, main, reports the coefficients for the covariates that do not vary over time; the second equation, tvc, reports the estimated coefficients for covariates interacted with time. There is no evidence that the coefficient of inject interacted with time is different from 0. Hence the proprtional-hazards assumption appears to hold for covariate inject.

For single-record interval-censored data, Stata also provides two graphical commands to visually assess the proportional-hazards assumption. These two commands can be used without running stintcox first.

For a single categorical covariate in a Cox model, you can use stintphplot, which plots -ln{-ln(survival)} curves for each category versus ln(analysis time). If the plots are parallel, the proportional-hazards assumption has not been violated. Alternatively, you can use stintcoxnp to plot the nonparametric maximum-likelihood estimation survival curve versus the Cox predicted survival curve for each category. When the two curves are close together, the proportional-hazards assumption is valid.

When a Cox model contains multiple covariates, as mentioned in the above example, only stintphplot is appropriate for graphically checking the proportional-hazards assumption. In that case, we need to use the adjustfor() option.

For example, to graphically check the proportional-hazards assumption for inject, we include all covariates except inject in the adjustfor() option:

. stintphplot, interval(ltime rtime) by(inject) adjustfor(age_mean i.male i.needle i.jail)

Fitting Cox model with covariates from option adjustfor()
for inject = No ...

Fitting Cox model with covariates from option adjustfor()
for inject = Yes ...

A separate Cox model, which contains all covariates from the adjustfor() option, is fit for each level of inject. And the two plots are almost parallel, which indicates that the proportional-hazards assumption has not been violated for the categorical variable inject.

Fitting a model for multiple-record-per-subject interval-censored data

Interval-censored data can also be recorded as multiple-record-per-subject format. In this format, each subject may contain multiple records with multiple examination times, the event status at each examination time, and potential time-varying covariates at each examination time. Here we use an extended version of previous single-record-per-subject dataset, which contains all the baseline covariates as well as a time-varying imprisonment indicator variable (jail_vary) that indicates whether the subject has been imprisioned since the last clinic visit. It also records a subject identifier (id), the examination time of the blood test (time), and whether the blood test is positive at the examination time (is_seropos). Here is a subset of this dataset for subject 271-274:

. use https://www.stata-press.com/data/r18/idu2
(Modified Bangkok IDU Preparatory Study with time-varying variable jail_vary)

. format time age_mean %6.2f

. list id time is_seropos age_mean male needle inject jail_vary
     if id >=271 & id <=274, sepby(id) noobs abbreviate(10)


    
      id    time   is_seropos   age_mean   male   needle   inject   jail_vary 
    
     271    4.89           No      -6.46    Yes      Yes       No          No 
     271    9.31           No      -6.46    Yes      Yes       No          No 
     271   13.38           No      -6.46    Yes      Yes       No         Yes 
     271   17.97           No      -6.46    Yes      Yes       No         Yes 
     271   22.00           No      -6.46    Yes      Yes       No          No 
    
     272    3.80           No       8.54     No       No       No         Yes 
     272    9.41          Yes       8.54     No       No       No          No 
    
     273    3.93           No     -11.46    Yes      Yes       No          No 
     273    8.00           No     -11.46    Yes      Yes       No          No 
     273   12.07           No     -11.46    Yes      Yes       No         Yes 
     273   15.97           No     -11.46    Yes      Yes       No         Yes 
     273   20.66           No     -11.46    Yes      Yes       No         Yes 
    
     274    3.87          Yes      -4.46    Yes      Yes      Yes         Yes

We use stintcox to fit a Cox proportional-hazards model in which the time to HIV infection depends on baseline covariates age_mean, male, needle, inject, and time-varying covariate jail_vary. In a multiple-record-per-subject format, we must specify options id(), time(), and status() with stintcox.

. stintcox age_mean i.male i.needle i.inject i.jail_vary, id(id) time(time)
     status(is_seropos)
note: time-varying covariates detected in the data; using method nearleft to
      impute their values between examination times.
note: using adaptive step size to compute derivatives.

Performing EM optimization (showing every 100 iterations):
Iteration 0:   Log likelihood = -1086.2564
Iteration 100: Log likelihood = -598.45375
Iteration 200: Log likelihood = -598.35872
Iteration 285: Log likelihood = -598.34887

Computing standard errors: ........................................ done

Interval-censored Cox regression                   Number of obs      =  6,453
Baseline hazard: Reduced intervals                 Number of subjects =  1,124
                                                           Uncensored =      0
ID variable: id                                         Left-censored =     41
Examination time: time                                 Right-censored =    991
Status indicator: is_seropos                           Interval-cens. =     92

                                                   Wald chi2(5)       =  17.03
Log likelihood = -598.34887                        Prob > chi2        = 0.0044



                              OPG                                             
        time   Haz. ratio   std. err.      z    P>|z|     [95% conf. interval]

    age_mean     .9714605    .012757    -2.20   0.027     .9467762    .9967884
              
        male  
        Yes      .6678044   .1816576    -1.48   0.138     .3918353    1.138138
              
      needle  
        Yes      1.271409   .2275426     1.34   0.180     .8952546    1.805609
              
      inject  
        Yes      1.370672   .2575405     1.68   0.093     .9484142    1.980928
              
   jail_vary  
        Yes      1.440966   .2916178     1.81   0.071     .9691488    2.142481

Time varying: jail_vary
Note: Standard error estimates may be more variable for small datasets and
      datasets with low proportions of interval-censored observations.

Compared with the previous example, after we account for time-varying imprisonment, the hazard ratio for inject increases from 1.25 to 1.37, but the effect of imprisonment decreases from 1.57 for baseline jail to 1.44 for time-varying jail_vary.

Graphing survivor functions

After fitting the model, we can use stcurve to plot the estimated survivor, failure, hazard, or cumulative hazard function. By default, stcurve evaluates the functions at the overall means of covariates.

. stcurve, survival
note: function evaluated at overall means of covariates.

With multiple-record data with time-varying covariates, we may want to incorporate the time-varying nature of those covariates when plotting the functions. In this case, we can use the attmeans option to evaluate the function at time-specific means.

. stcurve, survival attmeans
note: function evaluated at time-specific means of covariates.

We can also use the atframe() option to specify our own time-varying covariate values to be used to evaluate the survivor function. Suppose we want to plot the survivor curve for an individual with the same covariate pattern as subject 1 in our dataset. We can create a new frame called id1 and use frame put to copy the relevant information for subject 1 in this new frame. Then we list the data we just saved in frame id1.

. frame put time age_mean male needle inject jail_vary
     if id==1, into(id1)

. frame id1: list



        time   age_mean   male   needle   inject   jail_v~y 

  1.   4.13       5.54    Yes       No      Yes        Yes 
  2.   8.26       5.54    Yes       No      Yes         No 
  3.  12.33       5.54    Yes       No      Yes         No 
  4.  16.10       5.54    Yes       No      Yes         No 
  5.  20.10       5.54    Yes       No      Yes         No 

  6.  24.23       5.54    Yes       No      Yes         No 
  7.  28.62       5.54    Yes       No      Yes         No 
  8.  32.59       5.54    Yes       No      Yes         No 
  9.  36.20       5.54    Yes       No      Yes         No 
 10.  40.56       5.54    Yes       No      Yes        Yes

We can graph the survivor curve for this particular profile by typing

. stcurve, survival atframe(id1)
note: function evaluated at specified values of selected covariates and overall
      means of other covariates (if any).
note: covariate values from frame id1 used to evaluate function.

References

Zeng, D., L. Mao, and D. Lin. 2016. Maximum likelihood estimation for semiparametric transformation models with interval-censored data. Biometrika 103: 253–271.S

Zeng, D., F. Gao, and D. Lin. 2017. Maximum likelihood estimation for semiparametric regression models with multivariate interval-censored data. Biometrika 104: 505–525.

Turnbull, B. W. 1976. The empirical distribution function with arbitrarily grouped, censored and truncated data. Journal of the Royal Statistical Society, Series B 38: 290–295.

Farrington, C. P. 2000. Residuals for proportional hazards models with interval-censored survival data. Biometrics 56: 473–482.

Tell me more

To learn more about how to fit parametric models to interval-censored data using the stintreg command, see [ST] stintreg.

Products

New in Stata 18

Why Stata

All features

Disciplines

Stata/MP

Order Stata

Purchase

Order Stata

Bookstore

Stata Press

Stata Journal

Gift Shop

Learn

NetCourses

Classroom and web training

Organizational training

Webinars

Video tutorials

Third-party courses

Web resources

Teaching with Stata

Support

Training

Video tutorials

FAQs

Statalist: The Stata Forum

Resources

Technical support

Customer service

Company

News and events

Customer service

Careers

We use cookies

We use cookies to ensure that we give you the best experience on our website—to enhance site navigation, to analyze usage, and to assist in our marketing efforts. By continuing to use our site, you consent to the storing of cookies on your device and agree to delivery of content, including web fonts and JavaScript, from third party web services.

Cookie Settings

Privacy policy

Last updated: 16 November 2022

StataCorp LLC (StataCorp) strives to provide our users with exceptional products and services. To do so, we must collect personal information from you. This information is necessary to conduct business with our existing and potential customers. We collect and use this information only where we may legally do so. This policy explains what personal information we collect, how we use it, and what rights you have to that information.

Required cookies

Advertising cookies

Required cookies

These cookies are essential for our website to function and do not store any personally identifiable information. These cookies cannot be disabled.
Advertising and performance cookies

This website uses cookies to provide you with a better user experience. A cookie is a small piece of data our website stores on a site visitor's hard drive and accesses each time you visit so we can improve your access to our site, better understand how you use our site, and serve you content that may be of interest to you. For instance, we store a cookie when you log in to our shopping cart so that we can maintain your shopping cart should you not complete checkout. These cookies do not directly store your personal information, but they do support the ability to uniquely identify your internet browser and device.

Please note: Clearing your browser cookies at any time will undo preferences saved here. The option selected here will apply only to the device you are currently using.

Accept Cookies


		ltime rtime age_mean male needle inject jail

271.		22.00 . -6.46 Yes Yes No No
272.		3.80 9.41 8.54 No No No Yes
273.		20.66 . -11.46 Yes Yes No No
274.		0.00 3.87 -4.46 Yes Yes Yes Yes


		OPG
		Haz. ratio std. err. z P>\|z\| [95% conf. interval]

age_mean		.9684341 .0126552 -2.45 0.014 .9439452 .9935582

male
Yes		.6846949 .1855907 -1.40 0.162 .4025073 1.164717

needle
Yes		1.275912 .2279038 1.36 0.173 .8990401 1.810768

inject
Yes		1.250154 .2414221 1.16 0.248 .8562184 1.825334

jail
Yes		1.567244 .3473972 2.03 0.043 1.014982 2.419998


		OPG
		Haz. ratio std. err. z P>\|z\| [95% conf. interval]

age_mean		.9682508 .0126326 -2.47 0.013 .9438052 .9933295

needle
Yes		1.276222 .2270302 1.37 0.170 .9005422 1.808625

inject
Yes		1.245357 .2393768 1.14 0.254 .8544367 1.815131

jail
Yes		1.57314 .3490687 2.04 0.041 1.018337 2.430205


		OPG
		Coefficient std. err. z P>\|z\| [95% conf. interval]

main
age_mean		-.0321889 .0131026 -2.46 0.014 -.0578696 -.0065082

male
Yes		-.3817124 .2714232 -1.41 0.160 -.9136921 .1502674

needle
Yes		.2434583 .1791214 1.36 0.174 -.1076131 .5945298

inject
Yes		.3194498 .3209978 1.00 0.320 -.3096943 .9485939

jail
Yes		.4498814 .2230002 2.02 0.044 .012809 .8869538

tvc
inject
Yes		-.0080255 .0208446 -0.39 0.700 -.0488801 .0328292


	id time is_seropos age_mean male needle inject jail_vary

	271 4.89 No -6.46 Yes Yes No No
	271 9.31 No -6.46 Yes Yes No No
	271 13.38 No -6.46 Yes Yes No Yes
	271 17.97 No -6.46 Yes Yes No Yes
	271 22.00 No -6.46 Yes Yes No No

	272 3.80 No 8.54 No No No Yes
	272 9.41 Yes 8.54 No No No No

	273 3.93 No -11.46 Yes Yes No No
	273 8.00 No -11.46 Yes Yes No No
	273 12.07 No -11.46 Yes Yes No Yes
	273 15.97 No -11.46 Yes Yes No Yes
	273 20.66 No -11.46 Yes Yes No Yes

	274 3.87 Yes -4.46 Yes Yes Yes Yes


		OPG
time		Haz. ratio std. err. z P>\|z\| [95% conf. interval]

age_mean		.9714605 .012757 -2.20 0.027 .9467762 .9967884

male
Yes		.6678044 .1816576 -1.48 0.138 .3918353 1.138138

needle
Yes		1.271409 .2275426 1.34 0.180 .8952546 1.805609

inject
Yes		1.370672 .2575405 1.68 0.093 .9484142 1.980928

jail_vary
Yes		1.440966 .2916178 1.81 0.071 .9691488 2.142481


		time age_mean male needle inject jail_v~y

1.		4.13 5.54 Yes No Yes Yes
2.		8.26 5.54 Yes No Yes No
3.		12.33 5.54 Yes No Yes No
4.		16.10 5.54 Yes No Yes No
5.		20.10 5.54 Yes No Yes No

6.		24.23 5.54 Yes No Yes No
7.		28.62 5.54 Yes No Yes No
8.		32.59 5.54 Yes No Yes No
9.		36.20 5.54 Yes No Yes No
10.		40.56 5.54 Yes No Yes Yes

Interval-censored Cox model

<- See Stata's other features

Highlights

Let's see it work

Fitting a model for single-record-per-subject interval-censored data

Stratification

Checking proportional hazards assumption

Fitting a model for multiple-record-per-subject interval-censored data

Graphing survivor functions

References

Tell me more

We use cookies

Privacy policy

Required cookies

Advertising and performance cookies