Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: Interval censoring using intcens

From   Steve Samuels <>
Subject   Re: st: Interval censoring using intcens
Date   Thu, 2 Aug 2012 15:30:34 -0400

Patrick Munywoki:

I don't see a role for MI, as it requires that one knows when variables are missing. But if no infections are recorded between times t1 and t2, you can't tell:  either there was no infection or one started and stopped before it could be detected. 

As Stephen suggests, prior knowledge about bounds will be helpful. If, for example, the anticipated minimum duration is 3 days, then one would miss all infections that start <1 day into a four day interval, which would be about 1/8 of infections. If the anticipated minimum was 4 days, then you'd miss no infections in people who missed no visits.

Beyond this I think you need a Bayesian approach. I don't know the literature, but you might find some ideas at:

Hatfield, L.A., Boye, M.E., Hackshaw, M.D., and Carlin, B.P. (2012),   ``Multilevel Bayesian models for survival times and longitudinal patient-reported outcomes with many zeros," to appear J. Amer. Statist. Assoc.

which can be found, with BUGS code, at Brad Carlin's page:


On Aug 2, 2012, at 7:13 AM, <> <> wrote:

The key additional piece of information that you now provide is that, in
your panel, there are missing data at some time points for some

One way of viewing this is to say that the interval-widths in the
interval censoring may vary from individual to individual (because of
missing data). In terms of the strategy, that I outlined, I think that
makes the programming of the likelihood more complicated because the
strategy I proposed  assumed that the n'th interval along the survival
time axis is the same for all subjects. Judicious collapsing of
intervals might help get around this problem -- at least in the single
spell case -- but with multiple spells, your missing data points are
related to deciding when one spell finishes and another starts.  

Whatever, the point is that you have incomplete information. So to
proceed you'll have to bring in more information of some kind in some

I see your proposal to use MI as related to that. But, as ever with MI,
and especially in your context, what is your imputation model going to

Related: one approach might be to see if some sort of bounding approach
is possible. E.g. what would happen if you filled in all gaps assuming
that they were non-infectious times points or, alternatively, were all
infectious time points? (These are 2 imputation models ... ) I think
some researchers have used these approaches when modelling poverty spell
lengths using household panel data with annual interviews, and sometimes
income is missing at the interview so annual poverty status cannot
always be ascertained. Sorry, but I can't recall references.


Date: Wed, 1 Aug 2012 11:58:21 +0100
From: Patrick Munywoki <>
Subject: Re: st: Interval censoring using intcens

Many thanks for the suggestions.

The main problem in my dataset is i do not have an exact date/time of
the study participants either started or stopped shedding the
virus of interest. note i sample participant twice-a-week hence there
intervals of 3 to 4 days(longer in cases where sample was not collected)
between sample collections for all the participants. Any further ideas
how to analyse this data is welcome.

I am currently thinking of using imputation techniques to determine when
the infection episodes started and ended before i proceed with the
analysis. Your thoughts on this approach is also welcome.


On 29 July 2012 13:02, <> wrote:
> Steve Samuels provided very good advice. Some other reflections from
> -intcens- (on SSC) is a program that fits parametric _continuous_
> survival time distributions to interval-censored survival time data
> (a.k.a. as grouped or discrete time data). The program doesn't allow
> time-varying covariates. It has one row per spell/obs -- convenient
> the maximisation by -ml-.
> I'm not sure that -stpm- (which you ask about) is appropriate for
> interval-censored data. I would check further if I were you. (If it
> then also check out -stpm2- which is more flexible and faster. Use
> -findit- to get latest version -- it's from SJ or SSC.)
> You could think more generally about models for interval-censored data
> -- see the MS and lessons off my survival analysis webpages (URL
> for discussion and references.  This shows how you can fit models
> make no assumption about the shape of the underlying survival time
> distribution. (You can assume shapes for the interval-hazard if you
> wish; but can also assume interval-specific values if you wish and
> data allow it.) And time-varying covariates can be easily
> More complicated is what to do with multiple spells. (You don't
> them explicitly, but it sounds as if you have them according to your
> description.)   The key issue is non-independence across spells from
> same person. Steve Samuels remarked on this and suggested clustering
> standard errors (persons as clusters). An alternative is to assume
> parametric form for the individual-specific effect that generates the
> non-independence across spells from the same person -- this is
> a.k.a. 'unobserved heterogeneity'. The most straightforward of
> this would be:
> * Reorganise (expand) your data so that you have one row in data set
> each interval that each person is at risk of infection, and create an
> event occurrence indicator y_it for person i and interval t (see my
> Lessons)
> * Create any time-varying covariates required. At minimum, this will
> some specification for the duration dependence of the interval hazard
> * fit a -xtcloglog- model with the binary outcome variable being y_it.
> This assumes that the person-specific frailty is normal (Gaussian). Or
> just fit a -cloglog- model if you want to ignore frailty. Either way,
> you would be fitting the interval-censored model corresponding to an
> underlying continuous time model that satisfies the proportional
> assumption. (That assumption can be tested using interactions between
> explanatory variables and the variables summarising duration
> dependence.)  An alternative would be -xtlogit- and -logit- to data
> organised in the same way.
> [Cf. -pgmhaz8- and -hshaz- (on SSC) which also fit discrete time
> proportional hazards models with frailty (Gamma, and discrete mass
> point, respectively), but only to single spell data.  -xtcloglog- and
> -xtlogit- work with multiple spell data because the frailty is
> integrated out numerically.]

Please access the attached hyperlink for an important electronic communications disclaimer:

*   For searches and help try:

*   For searches and help try:

© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index