[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]
Re: st: left censoring in discrete-time duration model
The issue of the "zero" time, i.e. the time that magazines become at risk
of establishing a website is important to me, so let me explain what I'm
doing there. I actually have magazine data going back to 1990. However, I
use an internet archive (the "wayback machine") to identify when magazines
first started offering content on the web. This archive only goes back to
1996. Therefore, I cannot observe any magazine websites that existed before
1996. To deal with this, I have treated 1996 as the first year in which
magazines could have websites, and only used data from 1996 onwards.
However, this then creates a problem, which prompted my original question,
because I know that some magazines (don't know which ones) had websites
prior to 1996. I have tried a variety of methods for dealing with this: (1)
Treat all firms that
had a website in 1996 as if they adopted in 1996 (the first year of the
sample period), whether they adopted in 1996 or adopted earlier, i.e.
ignore the problem; (2) Exclude 1996 from the sample (begin the analysis
with 1997); (3) Drop all observations for firms that had websites in 1996.
My results are very robust to all three approaches, which I believe is
reasonable, because as Austin has documented, there were not many websites
Hopefully, I have now better explained my problem. Based on this, does my
approach seem reasonable? In particular, am I correct that there is no
reason to include pre-1996 data (given that I cannot observe websites
during this period)? thanks again. Daniel
At 02:05 PM 11/28/2006 -0500, you wrote:
Steven Samuels --
I would argue that 1992 is a natural "zero" time for Daniel Simon's
project, since I am reasonably certain no magazines had websites
before 1992--see e.g. the following from
The availability of CERN's files was announced in the UseNET
newsgroup, alt.hypertext, in August 1991. This was the first time that
the availability of the files was announced to the public. All of the
documents coded with HTML elements were stored on one main computer at
CERN. This special type of computer was called a "web server" (by the
physicists at CERN) because it "served-up" batches of cross-linked
HTML documents. There was only one Web server located at CERN, but by
the end of 1992 there were over 50 Web servers in the world.
See also http://news.netcraft.com/archives/web_server_survey.html for
an indication that the majority of websites online in 1996 were
created in 1996, one reason Daniel may find little difference in
results depending on how he treats the obs in 1996.
As for truncation vs censoring, I paraphrase the response from Stephen
[If a firm first becomes at risk on] the later of either the year web
technology became available  or the year when the firm itself
was established ..., then the problem appears to be one of left
truncation rather than left censoring. (Left truncation is also known
as delayed entry.) The correct likelihood involves conditioning on
the probability of surviving from t=0 to t at which first observed.
For discrete time survival models (as you appear to have), this is
easy to implement -- e.g. see my website materials -- as long as there
is not unobserved heterogeneity ('frailty').
So many website materials to choose from, though:
Perhaps Daniel will give us an update when he's reviewed some of them...
On 11/28/06, Steven Samuels <firstname.lastname@example.org> wrote:
On 11/16/06, Daniel Simon <email@example.com> wrote: Hi - I'm using -
hshaz- to estimate a discrete-time hazard model. I have some left
censoring that I'm not sure how to deal with. I am looking at firms
establishing websites. I can only observe the introduction of
websites from 1996 onwards. However, I know that some firms
established websites prior to 1996, but I'm not sure which ones.
Currently, I have tried three approaches: (1) Treat all firms that
had a website in 1996 as if they adopted in 1996 (the first year of
the sample period), whether they adopted in 1996 or adopted earlier;
(2) Exclude 1996 from the sample (begin the analysis with 1997); (3)
Drop all observations from 1996 for firms that had websites.
Daniel, you do NOT have left censoring OR truncationy, because you do
not have a natural ZERO starting time. Your time-dimension is
calendar time. Therefore your only analysis choice is number (3).
However if all you have is the year of startup, then you cannot
distinguish firms with websites at the start of 1996 and firms whose
websites appeared first in 1996, so you must choose option (2). If
you do have calendar dates of web site startup, use a continuous
rather than discrete-time model with day or week as a time-unit.
You have to decide how to deal with firms who began publishing after
1996. Many of them will START with web sites, so calendar year of
start will be a strong predictor of start of a website.
You can compare characteristics of firms with and without websites in
1996, but you cannot say anything about the duration of time with web
site for firms with websites in 1996.
* For searches and help try:
* For searches and help try: