Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

From |
<S.Jenkins@lse.ac.uk> |

To |
<statalist@hsphsun2.harvard.edu> |

Subject |
Re: st: Re: Interval censored survival model |

Date |
Sat, 26 Jan 2013 15:25:19 -0000 |

the Statalist FAQ) (a) please discuss topics only via Statalist; do not send to off-list private email addresses as well. Our discussion forum is Statalist. (b) please use your real name when posting to the list. On your follow-up questions: (a) -intcens-, as I said, is not something I have used. So, I pass on this question. Did you follow-up the suggestion about -stpm-? (b) the 'easy estimation' approach to fitting interval-censored and discrete time hazard regression models involves (a) reorganisation of the data into person-period form (same data structure as -xt- in Stata), in which each data row corresponds to each interval that a person is at risk of experiencing the event ("episode splitting"); (b) fitting of the multivariate regression model to these data. [See my website, below signature, for details.] Note that the baseline hazard that is fitted in this approach, e.g. using -pgmhaz8-, -hshaz-, or -xtcloglog-, refers to the _discrete time hazard_, not the underlying continuous time hazard. The latter can only be fitted using interval-censored data if additional assumptions are made -- which is precisely what -intcens- does when it assumes that the unobserved underlying continuous time baseline hazard takes some parametric functional form. The most commonly considered case with interval-censored and discrete data is when the intervals are of equal length, e.g. a "month". If one starts with a data set in which there is one row per subject, then step (a) corresponds to an -expand- of the data where the argument of that command refers to the number of periods (e.g. "months") each subject is at risk of experiencing the event. Suppose that the intervals observed are of unequal length, but interval length is common across subjects. So, e.g., if the first period at risk that is observed for each subject is 6 weeks long, the second period observed for each subject is 4 weeks long, the third period is 8 weeks, and so on. Now start thinking of slicing the survival time axis up into "weeks". For a subject who is observed in the first period only, you would expand the data so that s/he contributes 6 rows to the reorganised data set. For someone observed at risk for two periods, there would be 10 rows of data (6+4), and so on. I think that applying step (b) to these reorganised data would lead to estimates maximising the correct likelihood for the interval-censored model. (You might have to be careful about how you constrained the baseline _discrete_ hazard within intervals -- I haven't thought this through.) In fact, I think that the approach would still work if interval lengths varied across subjects. The general point is that I think that the episode splitting approach still works for more complicated cases than the equal-length-interval one; but the data reorganisation step may be more complicated and require more care. [Listers: please correct me if you think I'm wrong.] And note that this approach, while providing estimates of the slope coefficients on the predictors for the corresponding effects in the underlying continuous time model, does not provide estimates of the _continuous time_ baseline hazard. To get estimates of that you need additional assumptions, and that takes back to -intcens- type programs. Your message's reference to piece-wise constant exponential models suggests that you are still thinking in terms of fitting a continuous time model (the piece-wise constant aspect refers to the continuous time hazard, not the discrete one that is fitted). Stephen ------------------------------ Date: Fri, 25 Jan 2013 11:22:10 -0600 From: plumsh <plumsh119@gmail.com> Subject: Re: st: Re: Interval censored survival model Thank you very much for responding. I'm involved in the research that produced the original question so my response is to the point. Two questions: 1) I guess my issue with INTCENS boils down to a technicality, namely data formatting for intcens (searching statalist gives some hints but I'd very much like to verify). Again, suppose that observations on the same land parcel are recorded on, say, Jan 1 of 1980, 1997, 2005, and 2010 (same dates for all parcels in the sample). Say the intervals (t_0, t_1) are (1,8), (8,16), and (16,21). [not sure if counting from 1 is necessary but intcens ignores st settings] Should the data be in the following form then: id (land parcel) t_0 t_1 event (0=stays as farmland, 1=converted to housing) 1 1 8 0 1 8 16 0 1 16 21 1 2 1 8 0 2 8 16 1 2 16 21 0 3 1 8 0 3 8 16 0 3 16 21 0 As you see, parcel 1 gets converted in the third interval, parcel 2 in the second, and parcel 3 does not get converted and is censored at t=16 (end of third period). With the data in this form, is it OK to run the following: . intcens t_0 t_1 flood, dist(*) where FLOOD is floodplain level classification (i.e., time invariant). Will add more covariates of course. Knowing if I'm correct with this specification would make my day. 2) Regarding the reference to pgmhaz(8), I'm afraid I don't understand how the unequal interval length can be ignored. Even with constant piecewise proportional hazard, the likelihood depends on the interval length (t1 - t0). If there is no way to specify that in the syntax (dataset?), we can't use it even if the intervals are the same for all the subjects. Regards, On Fri, Jan 25, 2013 at 3:37 AM, <S.Jenkins@lse.ac.uk> wrote: > ------------------------------ > > Date: Thu, 24 Jan 2013 15:58:41 -0600 > From: plumsh <plumsh119@gmail.com> > Subject: st: Re: Interval censored survival model > >> The manual (Page 20 of the Survival Analysis section) explicitly > states >> that there are no discrete-time models in Stata. The only user-made > codes >> for grouped (interval censored) data that I found are pgmhaz(8), > hshaz, and >> intcens. The first two don't accommodate intervals of unequal length > and, >> unfortunately, the model and the syntax for INTCENS seems a little > obscure >> (at least to me at this point). >> >> My setup: land plots in agricultural use (farmland) have been > converted to >> residential and other commercial uses. Observations on the same land > parcel >> are recorded on, say, Jan 1 of 1980, 1997, 2005, and 2010 (same dates > for >> all parcels in the sample). Thus, the intervals are of unequal length. > Apart >> from that, we have stock sampling (the land has been farmed since a > long >> time ago; no record when and it does not really matter). >> >> I want to do survival analysis using location (distance to beach, > roads, >> schools), demographic (population density, mix, etc.), and economic > (plenty) >> parcel attributes. >> >> The theory on Grouped Duration Data analysis (particularly the > piecewise >> constant proportional hazard) is pretty straightforward (section 20.4 > in >> Wooldridge, Econometric Analysis of Cross Section and Panel Data). >> >> Since I don't have the time to write a readily working function for > the ml >> command, I would greatly appreciate any advice on how to estimate my >> interval censored (grouped) data on land parcels. Pity they didn't > record >> exact conversion times. My only alternative now is probit/logit codes > (I >> read most of the relevant posts on the Statalist archives). >> >> Regards >> >> Sheng > ============= > > To be frank, I don't see what the problem with using -intcens- (on SSC) > is. To me, the help file gives examples of how to use it. The command > line seeks, inter alia, the time points that define the intervals. To > me, -intcens- is very nice because of (a) the flexibility regarding > interval length (as you say), and (b) it's a convenient way of fitting a > number of continuous time _parametric_ models in the situation where the > available data are interval-censored. The restrictions of -intcens- to > me are: (c) time-varying predictors are not allowed; (d) there is a > particular set of parametric models and these may not suit you; (e) no > unobserved heterogeneity ('frailty'). > > The other user-written commands that you cite (by me, on SSC) handle (c) > and (e). I think they would also be ok if the unequal-length intervals > are the same unequal length for each person. That is, suppose 2 subjects > have the same spell length (number of intervals) recorded. If the first > interval is 2 months long for both (all) subjects, and the second > interval is 1 month long for all subjects, etc., then the likelihood is > fine. (One has to be careful about post-estimation interpretation, > however.) > > Also check out -stpm- on SSC. I've not used it, but the help file states > that it can handle interval-censored data. There is also -stpm2- on SSC > which is a development of -stpm-, but I am not sure whether it handles > interval-censored data (not mentioned in help file in the same way). If > Paul Lambert or Michael Crowther are list members, perhaps they can > clarify matters. > > I don't see how "probit/logit codes" would be a way forward, unless you > were to ignore the impact of elapsed duration on the hazard rate, and > simply model event occurrence. > > Stephen Stephen ------------------ Stephen P. Jenkins <s.jenkins@lse.ac.uk> Professor of Economic and Social Policy Department of Social Policy London School of Economics and Political Science Houghton Street, London WC2A 2AE, UK Tel: +44(0)20 7955 6527 The Great Recesssion and the Distribution of Household Incomes, OUP 2013, http://ukcatalogue.oup.com/product/9780199671021.do Changing Fortunes: Income Mobility and Poverty Dynamics in Britain, OUP 2011, http://ukcatalogue.oup.com/product/9780199226436.do Survival Analysis Using Stata: http://www.iser.essex.ac.uk/survival-analysis Downloadable papers and software: http://ideas.repec.org/e/pje7.html Please access the attached hyperlink for an important electronic communications disclaimer: http://lse.ac.uk/emailDisclaimer * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/faqs/resources/statalist-faq/ * http://www.ats.ucla.edu/stat/stata/

- Prev by Date:
**Re: st: QUAIDS** - Next by Date:
**Re: st: Regression Diagnostics for Models with cluster-robust standard errors** - Previous by thread:
**Re: st: Re: Interval censored survival model** - Next by thread:
**st: outreg2** - Index(es):