Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down on April 23, and its replacement, statalist.org is already up and running.

# Re: st: Discrete time hazard model - Interval width

 From "Stephen P. Jenkins" To Subject Re: st: Discrete time hazard model - Interval width Date Thu, 20 May 2010 10:05:45 +0100

```==============================
Date: Wed, 19 May 2010 21:15:01 +0200
From: gideluca@unical.it
Subject: Re: st: Discrete time hazard model - Interval width

Dear Steve,
The stock-sample includes all 2004 graduates at risk of finding a
job
(more precisely searching for a job and not pursuing other
studies)
who have been interviewed one year (2005), three years (2007) and

five years (2009) after their  graduation. So, we initially
select
only those who in the baseline interview 2004 are looking for a
job.
These individuals can remain in their unemployment state, find a

permanent job or be lost to follow-up.

We do not know exactly the date of the first real job they find,
we
just know if at the time of each subsequent interview they have a

stable job or not. In particular, the relevant questions to
identify
failures are: "Are you working at this moment" and "Is your job
stable?". Graduates finding a stable job, say, in the first
interview
are our failures. So, they are not at risk anymore and they are
never
gotten a "real job",  the  last observed interview serves to
define
our censoring points.

What's wrong with ignoring in a discrete time hazard model the
fact
that interviews are not administered regularly over time?
========================

The commonly-used ways of fitting discrete time hazard regression
models are based on the assumption of equal-width intervals. In
this case, one can show that the model likelihood is the same as
the likelihood for a binary dependent variable model applied to
expanded data in which there is one record (data row) for each
interval that each person is at risk. The same approach applies
when there are left-truncation (stock sampling): see Jenkins,
Oxford Bulletin of Econ & Stats 1995.

This correspondence, and hence the "easy estimation" method,
breaks down when the intervals are not of equal width. In this
case, one needs to more careful about the different length of
times that each person is at risk of experiencing the event over
the intervals of different width. -intcens- on SSC allows you to
do this, at the cost of not allowing time-varying covariates.

More generally, think of your data as "interval censored" rather
than "discrete". Very few social science survival analysis
processes, including yours, have survival times that are
intrinsically discrete. Most refer to some underlying process in
continuous time; the problem is that the times are recorded in
grouped (banded) form -- they are "interval censored".  (Models
for "interval censoring" and "discrete" survival time data
correspond exactly when there are intervals of equal-width
because you can then count "time" consistently using a sequence
of positive integers.)

I note that your intervals are really rather wide (in addition to
being of unequal width).  Off the top of my head, I wonder
whether another way to proceed might be to consider simply
modelling the binary sequence for your samples

jointly ...
Pr(got job by 2009 | no job by 2007)
Pr(got job by 2007 | no job by 2005)
Pr(got job by 2005)

This could be modelled as a trivariate probit with 2 selections
using the methods set out by Cappellari & Jenkins (Stata Journal,

Stephen
-------------------------------------
Professor Stephen P. Jenkins <stephenj@essex.ac.uk>
Institute for Social and Economic Research (ISER)
University of Essex, Colchester CO4 3SQ, UK
Tel: +44(0)1206 873374. Fax: +44(0)1206 873151
http://www.iser.essex.ac.uk
Survival Analysis using Stata:
http://www.iser.essex.ac.uk/survival-analysis