Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

From |
Steve Samuels <sjsamuels@gmail.com> |

To |
statalist@hsphsun2.harvard.edu |

Subject |
Re: st: Interval censoring using intcens |

Date |
Thu, 2 Aug 2012 15:30:34 -0400 |

Patrick Munywoki: I don't see a role for MI, as it requires that one knows when variables are missing. But if no infections are recorded between times t1 and t2, you can't tell: either there was no infection or one started and stopped before it could be detected. As Stephen suggests, prior knowledge about bounds will be helpful. If, for example, the anticipated minimum duration is 3 days, then one would miss all infections that start <1 day into a four day interval, which would be about 1/8 of infections. If the anticipated minimum was 4 days, then you'd miss no infections in people who missed no visits. Beyond this I think you need a Bayesian approach. I don't know the literature, but you might find some ideas at: Hatfield, L.A., Boye, M.E., Hackshaw, M.D., and Carlin, B.P. (2012), ``Multilevel Bayesian models for survival times and longitudinal patient-reported outcomes with many zeros," to appear J. Amer. Statist. Assoc. which can be found, with BUGS code, at Brad Carlin's page: http://www.biostat.umn.edu/~brad/software.html • Steve sjsamuels@gmail.com On Aug 2, 2012, at 7:13 AM, <S.Jenkins@lse.ac.uk> <S.Jenkins@lse.ac.uk> wrote: The key additional piece of information that you now provide is that, in your panel, there are missing data at some time points for some subjects. One way of viewing this is to say that the interval-widths in the interval censoring may vary from individual to individual (because of missing data). In terms of the strategy, that I outlined, I think that makes the programming of the likelihood more complicated because the strategy I proposed assumed that the n'th interval along the survival time axis is the same for all subjects. Judicious collapsing of intervals might help get around this problem -- at least in the single spell case -- but with multiple spells, your missing data points are related to deciding when one spell finishes and another starts. Whatever, the point is that you have incomplete information. So to proceed you'll have to bring in more information of some kind in some way. I see your proposal to use MI as related to that. But, as ever with MI, and especially in your context, what is your imputation model going to be? Related: one approach might be to see if some sort of bounding approach is possible. E.g. what would happen if you filled in all gaps assuming that they were non-infectious times points or, alternatively, were all infectious time points? (These are 2 imputation models ... ) I think some researchers have used these approaches when modelling poverty spell lengths using household panel data with annual interviews, and sometimes income is missing at the interview so annual poverty status cannot always be ascertained. Sorry, but I can't recall references. Stephen ------------------------------ Date: Wed, 1 Aug 2012 11:58:21 +0100 From: Patrick Munywoki <pmunywoki@gmail.com> Subject: Re: st: Interval censoring using intcens Many thanks for the suggestions. The main problem in my dataset is i do not have an exact date/time of when the study participants either started or stopped shedding the respiratory virus of interest. note i sample participant twice-a-week hence there are intervals of 3 to 4 days(longer in cases where sample was not collected) between sample collections for all the participants. Any further ideas on how to analyse this data is welcome. I am currently thinking of using imputation techniques to determine when the infection episodes started and ended before i proceed with the survival analysis. Your thoughts on this approach is also welcome. Thanks Patrick On 29 July 2012 13:02, <S.Jenkins@lse.ac.uk> wrote: > > Steve Samuels provided very good advice. Some other reflections from me: > > -intcens- (on SSC) is a program that fits parametric _continuous_ > survival time distributions to interval-censored survival time data > (a.k.a. as grouped or discrete time data). The program doesn't allow > time-varying covariates. It has one row per spell/obs -- convenient for > the maximisation by -ml-. > > I'm not sure that -stpm- (which you ask about) is appropriate for > interval-censored data. I would check further if I were you. (If it is, > then also check out -stpm2- which is more flexible and faster. Use > -findit- to get latest version -- it's from SJ or SSC.) > > You could think more generally about models for interval-censored data > -- see the MS and lessons off my survival analysis webpages (URL below) > for discussion and references. This shows how you can fit models which > make no assumption about the shape of the underlying survival time > distribution. (You can assume shapes for the interval-hazard if you > wish; but can also assume interval-specific values if you wish and your > data allow it.) And time-varying covariates can be easily incorporated. > > More complicated is what to do with multiple spells. (You don't mention > them explicitly, but it sounds as if you have them according to your > description.) The key issue is non-independence across spells from the > same person. Steve Samuels remarked on this and suggested clustering the > standard errors (persons as clusters). An alternative is to assume some > parametric form for the individual-specific effect that generates the > non-independence across spells from the same person -- this is 'frailty' > a.k.a. 'unobserved heterogeneity'. The most straightforward of handling > this would be: > * Reorganise (expand) your data so that you have one row in data set for > each interval that each person is at risk of infection, and create an > event occurrence indicator y_it for person i and interval t (see my > Lessons) > * Create any time-varying covariates required. At minimum, this will be > some specification for the duration dependence of the interval hazard > * fit a -xtcloglog- model with the binary outcome variable being y_it. > This assumes that the person-specific frailty is normal (Gaussian). Or > just fit a -cloglog- model if you want to ignore frailty. Either way, > you would be fitting the interval-censored model corresponding to an > underlying continuous time model that satisfies the proportional hazards > assumption. (That assumption can be tested using interactions between > explanatory variables and the variables summarising duration > dependence.) An alternative would be -xtlogit- and -logit- to data > organised in the same way. > > [Cf. -pgmhaz8- and -hshaz- (on SSC) which also fit discrete time > proportional hazards models with frailty (Gamma, and discrete mass > point, respectively), but only to single spell data. -xtcloglog- and > -xtlogit- work with multiple spell data because the frailty is > integrated out numerically.] > Please access the attached hyperlink for an important electronic communications disclaimer: http://lse.ac.uk/emailDisclaimer * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/ * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

**References**:**Re: st: Interval censoring using intcens***From:*<S.Jenkins@lse.ac.uk>

- Prev by Date:
**st: Skellam distribution** - Next by Date:
**st: Do I need to cluster errors?** - Previous by thread:
**Re: st: Interval censoring using intcens** - Next by thread:
**Re: st: Interval censoring using intcens** - Index(es):