Bookmark and Share

Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down at the end of May, and its replacement, is already up and running.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

st: Questions: 1. irregular interval censoring | 2. discrete data: ties/exactp/cox/cloglog/"flat region...r(430)"| 3. left censoring/ truncation/ others

From   "Oliver Eger" <>
To   <>
Subject   st: Questions: 1. irregular interval censoring | 2. discrete data: ties/exactp/cox/cloglog/"flat region...r(430)"| 3. left censoring/ truncation/ others
Date   Tue, 25 May 2010 11:14:20 +0200

Dear Stata users,

I ?am new to Stata and to the newsgroup. I would kindly ask for advice,
concerning my work on survival analysis.
I first want to post my questions.  A more detailed description of my data
and research follows below.


1.) My study time is 1900 till 2000. This is also the enrollment period. Due
to my definitions (see below, summary), the onset of risk for each company
is the year of first mentioning. This means, according to my opinion, that
my data is neither left censored nor left truncated. But I assume that at
least some companies existed before 1900, without having exact information
about this.

==> Are there any alternatives to tread the left borders of my data instead
of doing this by the definitions I do (see below, summary) ?

2.) My data are interval censored and interval truncated. But the intervals
are of irregular period in calendar time.

==> Are there any methods to deal with this irregularities or at least to
estimate their influence on my regressions?

3.) The discrete character of my data causes many ties. As mentioned below,
Cox (efron or exactp) and cloglog models give very similar results. When I
do regressions not only with parts but with my complete company data set,
using stcox and the exactp option, I got an error message: ?flat region
resulting in a missing likelihood r(43o)?

==> Are there any alternative algorithms/ ado files, calculating the
coefficients/ hazardrates as the exactp option would do it?


- I collected company data on a particular submarket. Time span is about 100
years.  My source only allowed company observation every two or three years
(points of ?principle observability?). Also there are extra gaps due to war
and source related reasons. So I have irregular intervals of observing in
calendar time.

- In addition, companies are sometimes unobservable at points of
observability, that means, they are not mentioned in my source. The reasons
for this are unknown. It could be due to a careless registration policy to
the source, or maybe the company doesn?t work in my particular submarket for
a certain time span, but only in the subordinated market. Nevertheless they
could be observed in the next or one of the next periods.

- My sources starts around 1900. I assume that at least some companies
existed before, but I do not have exact information about this. It ends
around 2000.

On the background of the facts reported, I defined the following for my
survival analysis:


- For the survival analysis I put the data in the long or multi id form. 

- When I observed a company in one year, I defined it to exist the whole
year, starting at 1st of January and ending 31st December.

- Due to the long form, I have got as many observations per company (id) as
years of mentions in my source, every observation starting at the 1st of
January and ending on 31st December.

- In the absence of more exact information (see (i)), I defined a company to
enter the market/ survival analysis in the first year of mentioning in my
source. Failure = exit occurs in the last year of mentioning.

- This is done by:  stset exit, failure(failure==1) id(id) origin(time
entry) time0(entry)
On this basis, I would like to estimate a  Cox model with some independent
variables influencing the survival time of my companies. Because of the
discrete character of my data, the method of choice should be estimating a
cloglog model (see the script of Jenkins or Hosmer/Lemeshow, Applied
Survival Analysis, Chapter 7), but using Cox model in stata, efron or exactp
option for ties and the stset command as above, stcox gives very similar
results (see also Hosmer/ Lemeshow, Chapter 7) .

Oliver Eger
Lehrstuhl für VWL - Innovationsökonomik
Universität Augsburg
Universität Hohenheim
Prof. Dr. Horst Hanusch - Prof. Dr. Andreas Pyka


*   For searches and help try:

© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   Site index