Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

# Re: st: Re: Interval censored survival model

 From To Subject Re: st: Re: Interval censored survival model Date Sat, 26 Jan 2013 15:25:19 -0000

```the Statalist FAQ)
(a) please discuss topics only via Statalist; do not send to off-list
private email addresses as well. Our discussion forum is Statalist.

(a) -intcens-, as I said, is not something I have used. So, I pass on
this question. Did you follow-up the suggestion about -stpm-?

(b) the 'easy estimation' approach to fitting interval-censored and
discrete time hazard regression models involves (a) reorganisation of
the data into person-period form (same data structure as -xt- in Stata),
in which each data row corresponds to each interval that a person is at
risk of experiencing the event ("episode splitting"); (b) fitting of the
multivariate regression model to these data. [See my website, below
signature, for details.]

Note that the baseline hazard that is fitted in this approach, e.g.
using -pgmhaz8-, -hshaz-, or -xtcloglog-, refers to the _discrete time
hazard_, not the underlying continuous time hazard. The latter can only
be fitted using interval-censored data if additional assumptions are
made -- which is precisely what -intcens- does when it assumes that the
unobserved underlying continuous time baseline hazard takes some
parametric functional form.

The most commonly considered case with interval-censored and discrete
data is when the intervals are of equal length, e.g. a "month". If one
starts with a data set in which there is one row per subject, then step
(a) corresponds to an -expand- of the data where the argument of that
command refers to the number of periods (e.g. "months") each subject is
at risk of experiencing the event.

Suppose that the intervals observed are of unequal length, but interval
length is common across subjects. So, e.g., if the first period at risk
that is observed for each subject is 6 weeks long, the second period
observed for each subject is 4 weeks long, the third period is 8 weeks,
and so on. Now start thinking of slicing the survival time axis up into
"weeks". For a subject who is observed in the first period only, you
would expand the data so that s/he contributes 6 rows to the reorganised
data set. For someone observed at risk for two periods, there would be
10 rows of data (6+4), and so on.  I think that applying step (b) to
these reorganised data would lead to estimates maximising the correct
likelihood for the interval-censored model. (You might have to be
careful about how you constrained the baseline _discrete_ hazard within
intervals -- I haven't thought this through.) In fact, I think that the
approach would still work if interval lengths varied across subjects.
The general point is that I think that the episode splitting approach
still works for more complicated cases than the equal-length-interval
one; but the data reorganisation step may be more complicated and
require more care.
[Listers: please correct me if you think I'm wrong.]

And note that this approach, while providing estimates of the slope
coefficients on the predictors for the corresponding effects in the
underlying continuous time model, does not provide estimates of the
_continuous time_ baseline hazard. To get estimates of that you need
additional assumptions, and that takes back to -intcens- type programs.

Your message's reference to piece-wise constant exponential models
suggests that you are still thinking in terms of fitting a continuous
time model (the piece-wise constant aspect refers to the continuous time
hazard, not the discrete one that is fitted).

Stephen

------------------------------

Date: Fri, 25 Jan 2013 11:22:10 -0600
From: plumsh <plumsh119@gmail.com>
Subject: Re: st: Re: Interval censored survival model

Thank you very much for responding. I'm involved in the research that
produced the original question so my response is to the point.

Two questions:
1) I guess my issue with INTCENS boils down to a technicality, namely
data formatting for intcens (searching statalist gives some hints but
I'd very much like to verify).
Again, suppose that observations on the same land parcel are recorded
on, say, Jan 1 of 1980, 1997, 2005, and 2010 (same dates for all
parcels in the sample). Say the intervals (t_0, t_1) are (1,8),
(8,16), and (16,21). [not sure if counting from 1 is necessary but
intcens ignores st settings] Should the data be in the following form
then:

id (land parcel)      t_0      t_1     event (0=stays as farmland,
1=converted to housing)
1                              1         8           0
1                              8         16         0
1                              16       21         1
2                              1         8           0
2                              8         16         1
2                              16       21         0
3                              1         8           0
3                              8         16         0
3                              16       21         0
As you see, parcel 1 gets converted in the third interval, parcel 2 in
the second, and parcel 3 does not get converted and is censored at
t=16 (end of third period).

With the data in this form, is it OK to run the following:

. intcens t_0  t_1  flood, dist(*)

where FLOOD is floodplain level classification (i.e., time invariant).
Will add more covariates of course.

Knowing if I'm correct with this specification would make my day.

2) Regarding the reference to pgmhaz(8), I'm afraid I don't understand
how the unequal interval length can be ignored. Even with constant
piecewise proportional hazard, the likelihood depends on the interval
length (t1 - t0). If there is no way to specify that in the syntax
(dataset?), we can't use it even if the intervals are the same for all
the subjects.

Regards,

On Fri, Jan 25, 2013 at 3:37 AM,  <S.Jenkins@lse.ac.uk> wrote:
> ------------------------------
>
> Date: Thu, 24 Jan 2013 15:58:41 -0600
> From: plumsh <plumsh119@gmail.com>
> Subject: st: Re: Interval censored survival model
>
>> The manual (Page 20 of the Survival Analysis section) explicitly
> states
>> that there are no discrete-time models in Stata. The only user-made
> codes
>> for grouped (interval censored) data that I found are pgmhaz(8),
> hshaz, and
>> intcens. The first two don't accommodate intervals of unequal length
> and,
>> unfortunately, the model and the syntax for INTCENS seems a little
> obscure
>> (at least to me at this point).
>>
>> My setup: land plots in agricultural use (farmland) have been
> converted to
>> residential and other commercial uses. Observations on the same land
> parcel
>> are recorded on, say, Jan 1 of 1980, 1997, 2005, and 2010 (same dates
> for
>> all parcels in the sample). Thus, the intervals are of unequal
length.
> Apart
>> from that, we have stock sampling (the land has been farmed since a
> long
>> time ago; no record when and it does not really matter).
>>
>> I want to do survival analysis using location (distance to beach,
>> schools), demographic (population density, mix, etc.), and economic
> (plenty)
>> parcel attributes.
>>
>> The theory on Grouped Duration Data analysis (particularly the
> piecewise
>> constant proportional hazard) is pretty straightforward (section 20.4
> in
>> Wooldridge, Econometric Analysis of Cross Section and Panel Data).
>>
>> Since I don't have the time to write a readily working function for
> the ml
>> command, I would greatly appreciate any advice on how to estimate my
>> interval censored (grouped) data on land parcels. Pity they didn't
> record
>> exact conversion times. My only alternative now is probit/logit codes
> (I
>> read most of the relevant posts on the Statalist archives).
>>
>> Regards
>>
>> Sheng
> =============
>
> To be frank, I don't see what the problem with using -intcens- (on
SSC)
> is. To me, the help file gives examples of how to use it. The command
> line seeks, inter alia, the time points that define the intervals. To
> me, -intcens- is very nice because of (a) the flexibility regarding
> interval length (as you say), and (b) it's a convenient way of fitting
a
> number of continuous time _parametric_ models in the situation where
the
> available data are interval-censored. The restrictions of -intcens- to
> me are: (c) time-varying predictors are not allowed; (d) there is a
> particular set of parametric models and these may not suit you; (e) no
> unobserved heterogeneity ('frailty').
>
> The other user-written commands that you cite (by me, on SSC) handle
(c)
> and (e). I think they would also be ok if the unequal-length intervals
> are the same unequal length for each person. That is, suppose 2
subjects
> have the same spell length (number of intervals) recorded. If the
first
> interval is 2 months long for both (all) subjects, and the second
> interval is 1 month long for all subjects, etc., then the likelihood
is
> fine. (One has to be careful about post-estimation interpretation,
> however.)
>
> Also check out -stpm- on SSC. I've not used it, but the help file
states
> that it can handle interval-censored data. There is also -stpm2- on
SSC
> which is a development of -stpm-, but I am not sure whether it handles
> interval-censored data (not mentioned in help file in the same way).
If
> Paul Lambert or Michael Crowther are list members, perhaps they can
> clarify matters.
>
> I don't see how "probit/logit codes" would be a way forward, unless
you
> were to ignore the impact of elapsed duration on the hazard rate, and
> simply model event occurrence.
>
> Stephen

Stephen
------------------
Stephen P. Jenkins <s.jenkins@lse.ac.uk>
Professor of Economic and Social Policy
Department of Social Policy
London School of Economics and Political Science
Houghton Street, London WC2A 2AE, UK
Tel: +44(0)20 7955 6527
The Great Recesssion and the Distribution of Household Incomes, OUP
2013,
http://ukcatalogue.oup.com/product/9780199671021.do
Changing Fortunes: Income Mobility and Poverty Dynamics in Britain, OUP
2011, http://ukcatalogue.oup.com/product/9780199226436.do
Survival Analysis Using Stata:
http://www.iser.essex.ac.uk/survival-analysis