Re: st: left censoring in discrete-time duration model

Date   Tue, 28 Nov 2006 14:05:43 -0500

Steven Samuels --
I would argue that 1992 is a natural "zero" time for Daniel Simon's
project, since I am reasonably certain no magazines had websites
before 1992--see e.g. the following from
The availability of CERN's files was announced in the UseNET
newsgroup, alt.hypertext, in August 1991. This was the first time that
the availability of the files was announced to the public. All of the
documents coded with HTML elements were stored on one main computer at
CERN. This special type of computer was called a "web server" (by the
physicists at CERN) because it "served-up" batches of cross-linked
HTML documents. There was only one Web server located at CERN, but by
the end of 1992 there were over 50 Web servers in the world.
See also for
an indication that the majority of websites online in 1996 were
created in 1996, one reason Daniel may find little difference in
results depending on how he treats the obs in 1996.

As for truncation vs censoring, I paraphrase the response from Stephen
P. Jenkins:
[If a firm first becomes at risk on] the later of either the year web
technology became available [1992] or the year when the firm itself
was established ..., then the problem appears to be one of left
truncation rather than left censoring. (Left truncation is also known
as delayed entry.)   The correct likelihood involves conditioning on
the probability of surviving from t=0 to t at which first observed.
For discrete time survival models (as you appear to have), this is
easy to implement -- e.g. see my website materials -- as long as there
is not unobserved heterogeneity ('frailty').
So many website materials to choose from, though:
Perhaps Daniel will give us an update when he's reviewed some of them...

On 11/28/06, Steven Samuels <[email protected]> wrote:
On 11/16/06, Daniel Simon <[email protected]> wrote: Hi - I'm using -
hshaz- to estimate a discrete-time hazard model. I have some left
censoring that I'm not sure how to deal with. I am looking at firms
establishing websites. I can only observe the introduction of
websites from 1996 onwards.  However, I know that some firms
established websites prior to 1996, but I'm not sure which ones.
Currently, I have tried three approaches: (1) Treat all firms that
had a website in 1996 as if they adopted in 1996 (the first year of
the sample period), whether they adopted in 1996 or adopted earlier;
(2) Exclude 1996 from the sample (begin the analysis with 1997); (3)
Drop all observations from 1996 for firms that had websites.

Daniel, you do NOT have left censoring OR truncationy, because you do
not have a natural ZERO starting time.   Your time-dimension is
calendar time.  Therefore your only analysis choice is number (3).
However if all you have is the year of startup, then you cannot
distinguish firms with websites at the start of 1996 and firms whose
websites appeared first in 1996, so you must choose option (2). If
you do have calendar dates of web site startup,  use a continuous
rather than discrete-time model with day or week as a time-unit.

You have to decide how to deal with firms who  began publishing after
1996.  Many of them will START with web sites, so calendar year of
start will be a strong predictor of  start of a website.

You can compare characteristics of firms with and without websites in
1996, but you cannot say anything about the duration of time with web
site for firms with websites in 1996.

