Bookmark and Share

Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down on April 23, and its replacement, is already up and running.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: Discrete time hazard model - Interval width

From   "Stephen P. Jenkins" <>
To   <>
Subject   Re: st: Discrete time hazard model - Interval width
Date   Thu, 20 May 2010 10:05:45 +0100

Date: Wed, 19 May 2010 21:15:01 +0200
Subject: Re: st: Discrete time hazard model - Interval width

Dear Steve,
Thank you for your help.
The stock-sample includes all 2004 graduates at risk of finding a
(more precisely searching for a job and not pursuing other
who have been interviewed one year (2005), three years (2007) and

five years (2009) after their  graduation. So, we initially
only those who in the baseline interview 2004 are looking for a
These individuals can remain in their unemployment state, find a

permanent job or be lost to follow-up.

We do not know exactly the date of the first real job they find,
just know if at the time of each subsequent interview they have a

stable job or not. In particular, the relevant questions to
failures are: "Are you working at this moment" and "Is your job  
stable?". Graduates finding a stable job, say, in the first
are our failures. So, they are not at risk anymore and they are  
discarded from the analysis. In addition, for people who had
gotten a "real job",  the  last observed interview serves to
our censoring points.

What's wrong with ignoring in a discrete time hazard model the
that interviews are not administered regularly over time?

The commonly-used ways of fitting discrete time hazard regression
models are based on the assumption of equal-width intervals. In
this case, one can show that the model likelihood is the same as
the likelihood for a binary dependent variable model applied to
expanded data in which there is one record (data row) for each
interval that each person is at risk. The same approach applies
when there are left-truncation (stock sampling): see Jenkins,
Oxford Bulletin of Econ & Stats 1995.

This correspondence, and hence the "easy estimation" method,
breaks down when the intervals are not of equal width. In this
case, one needs to more careful about the different length of
times that each person is at risk of experiencing the event over
the intervals of different width. -intcens- on SSC allows you to
do this, at the cost of not allowing time-varying covariates. 

More generally, think of your data as "interval censored" rather
than "discrete". Very few social science survival analysis
processes, including yours, have survival times that are
intrinsically discrete. Most refer to some underlying process in
continuous time; the problem is that the times are recorded in
grouped (banded) form -- they are "interval censored".  (Models
for "interval censoring" and "discrete" survival time data
correspond exactly when there are intervals of equal-width
because you can then count "time" consistently using a sequence
of positive integers.)

I note that your intervals are really rather wide (in addition to
being of unequal width).  Off the top of my head, I wonder
whether another way to proceed might be to consider simply
modelling the binary sequence for your samples

For your sample of 2004 graduates looking for a job, model
jointly ...
	Pr(got job by 2009 | no job by 2007)
	Pr(got job by 2007 | no job by 2005)
	Pr(got job by 2005)

This could be modelled as a trivariate probit with 2 selections
using the methods set out by Cappellari & Jenkins (Stata Journal,
6(2) 2006, downloadable from SJ website).

Professor Stephen P. Jenkins <>
Institute for Social and Economic Research (ISER)
University of Essex, Colchester CO4 3SQ, UK
Tel: +44(0)1206 873374. Fax: +44(0)1206 873151 
Survival Analysis using Stata:  
Downloadable papers and software: 

*   For searches and help try:

© Copyright 1996–2016 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   Site index