Stata: Data Analysis and Statistical Software

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
AW: st: discrete hazard modells: irregular time intervals

From	"Oliver Eger" <[email protected]>
To	<[email protected]>
Subject	AW: st: discrete hazard modells: irregular time intervals
Date	Tue, 1 Jun 2010 18:58:24 +0200
Dear Steven,

thanks for your helpful and quick comments! 

We will internally discuss the lifetime and apc issue! Perhaps I come back
to this later...


Maybe some of my questions are learner questions, but it seems to me that
the best way to make progress in survival analysis is to ask them;-):

	1.) If I use semi parametric analysis: do I have to pay attention to
frailty also, or just when performing full parametric analysis?	

	2.) The pgmhaz8 or hshaz commands you talked about: I can use this
only in a full parametric approach. Is this right? But I don?t want to make
any 	assumptions about the functional form of my hazard. So, what to do?

	3.) Because of the grouped data issue: can this also be done in a
semiparametric/ Cox survival or a cloglog model regression?

	4.) What exactly is a finite mixture model and how can I perform it
in Stata?

	5.) How could I weight risk by the exposure time in practice? What
do I have to do therefore with my data?

	6.) If I understand it correctly: there is no direct method/
approach that could be applied to the kind of irregular interval data I
built-on?
	So, do you know anybody, doing basic research on such econometric/
methodical questions?


Best regards,
Oliver



-----Ursprüngliche Nachricht-----
Von: [email protected]
[mailto:[email protected]] Im Auftrag von Steve Samuels
Gesendet: Montag, 31. Mai 2010 13:52
An: [email protected]; [email protected]
Betreff: Re: st: discrete hazard modells: irregular time intervals

Oliver privately  sent Bob and me the email below.  I've removed the
figures that he mentions.

You have defined failure as the "last year" that the company is
mentioned in your source.  But companies can (temporarily) disappear
and re-appear  for the reasons you mention.  What is the maximum time
that a company would have to be absent before you believe that the
disappearance is permanent?  Say, for the sake of argument, it is six
years.  Then I suggest that you end your observations in 1994, and
define a disappearance as any company which has disappeared before
then,  because you cannot be sure company that disappears in 1995-2000
will not reappear.  I also suggest that you use the time to the last
disappearance defined this way as your outcome and ignore the prior
gaps.  This might be your best analysis under the circumstances.
Also, investigate the age-period-cohort approach.



Some other comments:
1. If the address books are published every two years, then a
"permanent"disappearance of the kind I suggest means that you do
indeed have interval censoring (in the two years since the last
publication).  You cannot detect disappearances and re-appearances
within the two year publishing interval.  Therefore, I think that you
will have to assume that the company existed during the entire
interval.
2. I have no solution to your left-side problem.
3. "Discrete" and "grouped" analysis will be the same only if the
grouping intervals are  of equal size.  See: Stephen Jenkins's text
Survival Analysis using Stata:
http://www.iser.essex.ac.uk/survival-analysis
4. Definition of "lifetime" is an issue because of the temporary
disappearances and re-appearances.

Good luck.

Steven

--
Steven Samuels
[email protected]
18 Cantine's Island
Saugerties NY 12477
USA
Voice: 845-246-0774
Fax:    206-202-4783







On Mon, May 31, 2010 at 6:43 AM, Oliver Eger <[email protected]> wrote:
>
> Dear Robert,
>
> Dear Steven,
>
>
>
> first of all I want to thank you for your feedback, the detailed advices
and the time you spend on it!
>
>
>
> I like to answer (& ask;-)) to your suggestions chronologically and as
structured as possible:
>
>
>
>
>
> 1.)   As Robert suggested, I checked chapter 5 in Singer/ Willett,    
Applied Longitudinal Data Analysis. If I understood correctly, in these
examples there are additional time information (e.g. table 5.1: AGE, AGEGRP
& WAVE). Such additional information you propose, should be used to handle
my problem of irregular time intervals with.
>
>
>
> Before I continue, I should give some more insights into my data, just to
avoid misunderstandings. I send you also two graphs to your email, just to
illustrate. Graphs say more than words.
>
>
>
> In fact, I don?t have much information about my firms. All I have are old
address books. The issues are regularly published in two year periods. But
as you can see in figure 1, due to war and irregularities in publishing,
there are several gaps. Let?s call this the ?systematic error?, because it
is related to the source.
>
>
>
> In addition, not every company is published in all possible issues, or in
others words: not every company is observable in every ?point of
observability?. The reasons therefore are unknown. One could imagine, that
this is due to a careless registration policy of the company concerning the
address book, or that the company is not working in my particular submarket,
I? am interested in. That means, that the company is likely to exist during
this period(s), but on the ?main market?, or other related submarkets, in
which I ?am not interested in. However it is, the company is in fact not
observed. Let?s call this now the ?object error?, because it is related to
the object/ company.
>
>
>
> I now did the following:
>
> -    (i)   First of all, I brought my data in the long or multi id form.
>
> -    (ii)  Every company gets as many observations as times/ years of
mentioning in my address books
>
> -    (iii) Every of these observations is defined to start at the 1st of
January and ends at the 31th December. That means: if a company is mentioned
in one issue of the address book, it is assumed to operate on my submarket
for one complete year. I think that this is a reasonable assumption.
>
> -    (iv)  If a systematic or a object error occurs, for stata this is now
a gap, because of my definitions via stset and the multi id form. The
reasons for the gap are in the end unknown. No certain information are
available. I thought that this is the best way to handle this.
>
> -    (v)  I defined the first entry/ first observation of each company as
the first year of mentioning in my address book, exit/ failure is the last
year of mentioning. Gaps are included as described above.
>
>
>
> So, what I want to say is: I ?am not sure, if I could proceed as proposed
in chapter 5 Singer/ Willett, because I don´t have such additional time
information in my data. Or is there a misunderstanding on my site?
>
>
>
>
>
> 2.)  Steven is definitely right, when he says, that the data is not left
truncated or censored. This is because I defined the first entry/ first
observation as just described above (see (v)). So it is also true, that I
have a mix of brand new and already existing companies in my first year(s).
Both are in indistinguishable to me. They are indistinguishable, because the
only information I have is the mentioning in 1900 or 1902, or 1904 etc., but
this definitely does not allow to draw conclusion if this is also the time
point of their first market entry. The company could also be active on the
market in 1898 or 1896 etc. I just don?t know.
>
>
>
>
>
> 3.) Steve said, that according to 2.) there could be a bias. That?s
definitely correct. What I would argue is the following: My database covers
around 100 years. This is a long period. It includes nearly 7000 companies.
Period 1 measures around 80 companies, period two around 350. The historical
literature describing my submarket says, that the time of beginning of my
database is accompanying the advent of the market/ industry. So I think,
that from a practical point of view, the bias should not be very strong. In
fact, there was not too much industry activity before the beginning of my
database, or in other words, my database covers nearly the whole industry
cycle.
>
>
>
> Nevertheless I would appreciate to handle my data in a formal
mathematical/ methodological correct way, at least to get a clear picture of
the amount of influence of this bias.
>
>
>
> The question to 2.) and 3.) is: Are there any methods available for such a
data situation, particularly concerning the left side of data?
>
>
>
>
>
> 4.) Irregular time intervals: if you look at figure 1, these are my gaps
in calendar time. In survival analysis, all companies are independent of
calendar time, in fact they all start at their market entry. If I order the
companies in this way, I get something like figure 2. That means that I get
a situation as it is described in chapter 5.2.1 in Singer/ Willett,
?Analyzing data set in Which the Number of Waves per Person Varies?. In my
data, this situation is enhanced because the ?systematic error? and the
?object? error (see above) come together. In addition, I have a quite big
number of companies/ objects (~ 7000). This should converge in a more or
less mutual canceling out of the errors.
>
>
>
> Nevertheless: are there any formal methods applied to survival analysis
and or implemented in Stata for such irregular intervals?
>
>
>
>
>
> 5.) Grouped data methods:
>
> I installed the pgmhaz8- and ?hshaz routines and read help.
>
>
>
> Following questions with this:
>
> (i)     Grouped data is just a synonym for discrete time?
>
> (ii)    If yes, I could also use a clolog model/ regression instead of 
pgmhaz8 or hshaz?
>
> (iii)  Both routines are for full parametric use. I just want to perform a
semi parametric analysis. Are there any alternatives?
>
> (iv)    If I use semi parametric analysis: do I have to pay attention to
frailty also, or just when performing full parametric analysis?
>
>
>
>
>
> 6.) What exactly is a finite mixture model and how can I perform it in
Stata?
>
>
>
>
>
> 7.) How could I weight risk by the exposure time in practice? What do I
have to do therefore with my data?
>
>
>
>
>
>
>
>
>
> Figure 1: sketch of the points of observability in calendar time
>
>
>
>
>
> Figure 2: selected age groups of companies, all starting at t=0 and their
respective points of observability
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
> -----Ursprüngliche Nachricht-----
> Von: [email protected]
[mailto:[email protected]] Im Auftrag von Steve Samuels
> Gesendet: Freitag, 28. Mai 2010 01:02
> An: [email protected]
> Betreff: Re: st: discrete hazard modells: irregular time intervals
>
>
>
> Oliver-
>
>
>
> Welcome to Statalist! The -search- command is a very useful tool for
>
> finding Stata resources.
>
> "search interval censoring, all" turns up -intcens-. I also agree with
>
> Robert about trying to simplify your data. If you can locate your
>
> events to single years or even decades, then you could create grouped
>
> follow-up intervals (they need not be of equal length) use the grouped
>
> data methods of and use -pgmhaz8- and -hshaz-, programs contributed by
>
> Stephen Jenkins, and downloadable with -ssc- .
>
>
>
> I don't see that you can use methods for left-truncation, because you
>
> lack the information on the start date for companies that were in
>
> existence prior to 1900. At time "zero" your companies will be a
>
> mixture of brand-new and established companies. I think this risks
>
> serious bias. I strongly suggest that you analyze only companies whose
>
> start dates you know.
>
>
>
> As Robert said, disentangling age, period, and cohort effects can be
>
> challenging. Take a look at the contributed -apc- command, also
>
> available from -ssc-
>
>
>
> -Steve
>
>
>
> Steven J. Samuels
>
> [email protected]
>
> 18 Cantine's Island
>
> Saugerties NY 12477
>
> USA
>
> Voice: 845-246-0774
>
> Fax: 206-202-4783
>
>
>
>
>
>
>
> >> ----- Original Message -----
>
> >> From: Oliver Eger <[email protected]>
>
> >> Date: Wednesday, May 26, 2010 8:14 am
>
> >> Subject: st: discrete hazard modells: irregular time intervals
>
> >> To: [email protected]
>
> >>
>
> >>
>
> >> > Dear Stata users,
>
> >> >
>
> >> > I ?am new to Stata and to the newsgroup. I would kindly ask for
advice,
>
> >> > concerning my work on survival analysis.
>
> >> >
>
> >> > I collected company data. My data are interval censored and interval
>
> >> > truncated. The intervals are of irregular period in calendar time.
>
> >> >
>
> >> > Are there any methods to deal with this irregularities or at least to
>
> >> > estimate their influence on my survival regressions?
>
> >> >
>
> >> > Best regards,
>
> >> > Oliver
>
> >> >
>
>

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
Follow-Ups:
- Re: st: discrete hazard modells: irregular time intervals
  - From: Steve Samuels <[email protected]>
Prev by Date: st: RE: eivreg and deming
Next by Date: st: two-stage dummy variable model
Previous by thread: st: Re: nlcom for elasticity in nonlinear least squares
Next by thread: Re: st: discrete hazard modells: irregular time intervals
Index(es):
- Date
- Thread