Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

# Re: st: discrete hazard modells: irregular time intervals

 From Steve Samuels To statalist@hsphsun2.harvard.edu Subject Re: st: discrete hazard modells: irregular time intervals Date Tue, 1 Jun 2010 14:16:45 -0400

```Oliver, some of these do not call for a short answer, so I'll just
refer you to the references.

On Tue, Jun 1, 2010 at 12:58 PM, Oliver Eger <oliver.eger@yahoo.de> wrote:
> Dear Steven,
>
>
> We will internally discuss the lifetime and apc issue! Perhaps I come back
> to this later...
>
>
> Maybe some of my questions are learner questions, but it seems to me that
> the best way to make progress in survival analysis is to ask them;-):
>
>        1.) If I use semi parametric analysis: do I have to pay attention to
> frailty also, or just when performing full parametric analysis?

Answer: It depends on what you mean by "semi parametric".  Cox models
in Stata require a group whose members share a common frailty.

>
>        2.) The pgmhaz8 or hshaz commands you talked about: I can use this
> only in a full parametric approach. Is this right?  But I don’t want to make
> any     assumptions about the functional form of my hazard. So, what to do?

Answer: These models don't impose a functional form for the hazard,
just for the hazard ratio. You might consider them "parametric", but I
think they are semi-parametric.

>        3.) Because of the grouped data issue: can this also be done in a
> semiparametric/ Cox survival or a cloglog model regression?
Answer: -cloglog-  has no -frailty- option, and the frailty option -st
cox- probably doesn't apply to your data.

>        4.) What exactly is a finite mixture model and how can I perform it
> in Stata?

Answer: A mixture model is one in which the data point can  arise from
any one of K distributions, with distribution "k" chosen with
probability p_k. "findit mixture" or "search mixture, all"  will find
several finite mixture programs in Stata.

>        5.) How could I weight risk by the exposure time in practice? What
> do I have to do therefore with my data?

Answer: I'm not sure what you mean here. The survival programs do this
automatically.
>
>        6.) If I understand it correctly: there is no direct method/
> approach that could be applied to the kind of irregular interval data I
> built-on?

Yes, but it's not your irregular intervals. The -intcens- program can
handle interval censoring with irregular intervals..  It's the lack of
definition of what you mean by "lifetime": you want to exclude time
that a company was in existence, but not in your market.   There is no
solution to the "left-hand" problem.

>        So, do you know anybody, doing basic research on such econometric/
> methodical questions?

I don't, but I'm not an econometrician.  I'm not sure that this is
even a survival problem per se.  There might be some suggestions in
the biostatistical literature on the appearance and recurrence of
disease, with imperfect periodic screening. A Bayesian version might
help  cope with some of the uncertainties.. I don't think that Stata
has anything to offer in this area.

Good luck!

Steve

Steven Samuels
sjsamuels@gmail.com
18 Cantine's Island
Saugerties NY 12477
USA
Voice: 845-246-0774
Fax:    206-202-4783

> -----Ursprüngliche Nachricht-----
> Von: owner-statalist@hsphsun2.harvard.edu
> [mailto:owner-statalist@hsphsun2.harvard.edu] Im Auftrag von Steve Samuels
> Gesendet: Montag, 31. Mai 2010 13:52
> An: statalist@hsphsun2.harvard.edu; bob.yaffee@nyu.edu
> Betreff: Re: st: discrete hazard modells: irregular time intervals
>
> Oliver privately  sent Bob and me the email below.  I've removed the
> figures that he mentions.
>
> You have defined failure as the "last year" that the company is
> mentioned in your source.  But companies can (temporarily) disappear
> and re-appear  for the reasons you mention.  What is the maximum time
> that a company would have to be absent before you believe that the
> disappearance is permanent?  Say, for the sake of argument, it is six
> years.  Then I suggest that you end your observations in 1994, and
> define a disappearance as any company which has disappeared before
> then,  because you cannot be sure company that disappears in 1995-2000
> will not reappear.  I also suggest that you use the time to the last
> disappearance defined this way as your outcome and ignore the prior
> gaps.  This might be your best analysis under the circumstances.
> Also, investigate the age-period-cohort approach.
>
>
>
> 1. If the address books are published every two years, then a
> "permanent"disappearance of the kind I suggest means that you do
> indeed have interval censoring (in the two years since the last
> publication).  You cannot detect disappearances and re-appearances
> within the two year publishing interval.  Therefore, I think that you
> will have to assume that the company existed during the entire
> interval.
> 2. I have no solution to your left-side problem.
> 3. "Discrete" and "grouped" analysis will be the same only if the
> grouping intervals are  of equal size.  See: Stephen Jenkins's text
> Survival Analysis using Stata:
> http://www.iser.essex.ac.uk/survival-analysis
> 4. Definition of "lifetime" is an issue because of the temporary
> disappearances and re-appearances.
>
> Good luck.
>
> Steven
>
> -
>
> On Mon, May 31, 2010 at 6:43 AM, Oliver Eger <oliver.eger@yahoo.de> wrote:
>>
>> Dear Robert,
>>
>> Dear Steven,
>>
>>
>>
>> first of all I want to thank you for your feedback, the detailed advices
> and the time you spend on it!
>>
>>
>>
> structured as possible:
>>
>>
>>
>>
>>
>> 1.)   As Robert suggested, I checked chapter 5 in Singer/ Willett,
> Applied Longitudinal Data Analysis. If I understood correctly, in these
> examples there are additional time information (e.g. table 5.1: AGE, AGEGRP
> & WAVE). Such additional information you propose, should be used to handle
> my problem of irregular time intervals with.
>>
>>
>>
>> Before I continue, I should give some more insights into my data, just to
> avoid misunderstandings. I send you also two graphs to your email, just to
> illustrate. Graphs say more than words.
>>
>>
>>
>> In fact, I don’t have much information about my firms. All I have are old
> address books. The issues are regularly published in two year periods. But
> as you can see in figure 1, due to war and irregularities in publishing,
> there are several gaps. Let’s call this the “systematic error”, because it
> is related to the source.
>>
>>
>>
>> In addition, not every company is published in all possible issues, or in
> others words: not every company is observable in every “point of
> observability”. The reasons therefore are unknown. One could imagine, that
> this is due to a careless registration policy of the company concerning the
> address book, or that the company is not working in my particular submarket,
> I’ am interested in. That means, that the company is likely to exist during
> this period(s), but on the “main market”, or other related submarkets, in
> which I ‘am not interested in. However it is, the company is in fact not
> observed. Let’s call this now the “object error”, because it is related to
> the object/ company.
>>
>>
>>
>> I now did the following:
>>
>> -    (i)   First of all, I brought my data in the long or multi id form.
>>
>> -    (ii)  Every company gets as many observations as times/ years of
> mentioning in my address books
>>
>> -    (iii) Every of these observations is defined to start at the 1st of
> January and ends at the 31th December. That means: if a company is mentioned
> in one issue of the address book, it is assumed to operate on my submarket
> for one complete year. I think that this is a reasonable assumption.
>>
>> -    (iv)  If a systematic or a object error occurs, for stata this is now
> a gap, because of my definitions via stset and the multi id form. The
> reasons for the gap are in the end unknown. No certain information are
> available. I thought that this is the best way to handle this.
>>
>> -    (v)  I defined the first entry/ first observation of each company as
> the first year of mentioning in my address book, exit/ failure is the last
> year of mentioning. Gaps are included as described above.
>>
>>
>>
>> So, what I want to say is: I ‘am not sure, if I could proceed as proposed
> in chapter 5 Singer/ Willett, because I don´t have such additional time
> information in my data. Or is there a misunderstanding on my site?
>>
>>
>>
>>
>>
>> 2.)  Steven is definitely right, when he says, that the data is not left
> truncated or censored. This is because I defined the first entry/ first
> observation as just described above (see (v)). So it is also true, that I
> have a mix of brand new and already existing companies in my first year(s).
> Both are in indistinguishable to me. They are indistinguishable, because the
> only information I have is the mentioning in 1900 or 1902, or 1904 etc., but
> this definitely does not allow to draw conclusion if this is also the time
> point of their first market entry. The company could also be active on the
> market in 1898 or 1896 etc. I just don’t know.
>>
>>
>>
>>
>>
>> 3.) Steve said, that according to 2.) there could be a bias. That’s
> definitely correct. What I would argue is the following: My database covers
> around 100 years. This is a long period. It includes nearly 7000 companies.
> Period 1 measures around 80 companies, period two around 350. The historical
> literature describing my submarket says, that the time of beginning of my
> database is accompanying the advent of the market/ industry. So I think,
> that from a practical point of view, the bias should not be very strong. In
> fact, there was not too much industry activity before the beginning of my
> database, or in other words, my database covers nearly the whole industry
> cycle.
>>
>>
>>
>> Nevertheless I would appreciate to handle my data in a formal
> mathematical/ methodological correct way, at least to get a clear picture of
> the amount of influence of this bias.
>>
>>
>>
>> The question to 2.) and 3.) is: Are there any methods available for such a
> data situation, particularly concerning the left side of data?
>>
>>
>>
>>
>>
>> 4.) Irregular time intervals: if you look at figure 1, these are my gaps
> in calendar time. In survival analysis, all companies are independent of
> calendar time, in fact they all start at their market entry. If I order the
> companies in this way, I get something like figure 2. That means that I get
> a situation as it is described in chapter 5.2.1 in Singer/ Willett,
> “Analyzing data set in Which the Number of Waves per Person Varies”. In my
> data, this situation is enhanced because the “systematic error” and the
> “object” error (see above) come together. In addition, I have a quite big
> number of companies/ objects (~ 7000). This should converge in a more or
> less mutual canceling out of the errors.
>>
>>
>>
>> Nevertheless: are there any formal methods applied to survival analysis
> and or implemented in Stata for such irregular intervals?
>>
>>
>>
>>
>>
>> 5.) Grouped data methods:
>>
>> I installed the pgmhaz8- and –hshaz routines and read help.
>>
>>
>>
>> Following questions with this:
>>
>> (i)     Grouped data is just a synonym for discrete time?
>>
>> (ii)    If yes, I could also use a clolog model/ regression instead of
> pgmhaz8 or hshaz?
>>
>> (iii)  Both routines are for full parametric use. I just want to perform a
> semi parametric analysis. Are there any alternatives?
>>
>> (iv)    If I use semi parametric analysis: do I have to pay attention to
> frailty also, or just when performing full parametric analysis?
>>
>>
>>
>>
>>
>> 6.) What exactly is a finite mixture model and how can I perform it in
> Stata?
>>
>>
>>
>>
>>
>> 7.) How could I weight risk by the exposure time in practice? What do I
> have to do therefore with my data?
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> Figure 1: sketch of the points of observability in calendar time
>>
>>
>>
>>
>>
>> Figure 2: selected age groups of companies, all starting at t=0 and their
> respective points of observability
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> -----Ursprüngliche Nachricht-----
>> Von: owner-statalist@hsphsun2.harvard.edu
> [mailto:owner-statalist@hsphsun2.harvard.edu] Im Auftrag von Steve Samuels
>> Gesendet: Freitag, 28. Mai 2010 01:02
>> An: statalist@hsphsun2.harvard.edu
>> Betreff: Re: st: discrete hazard modells: irregular time intervals
>>
>>
>>
>> Oliver-
>>
>>
>>
>> Welcome to Statalist! The -search- command is a very useful tool for
>>
>> finding Stata resources.
>>
>> "search interval censoring, all" turns up -intcens-. I also agree with
>>
>> Robert about trying to simplify your data. If you can locate your
>>
>> events to single years or even decades, then you could create grouped
>>
>> follow-up intervals (they need not be of equal length) use the grouped
>>
>> data methods of and use -pgmhaz8- and -hshaz-, programs contributed by
>>
>>
>>
>>
>> I don't see that you can use methods for left-truncation, because you
>>
>> lack the information on the start date for companies that were in
>>
>> existence prior to 1900. At time "zero" your companies will be a
>>
>> mixture of brand-new and established companies. I think this risks
>>
>> serious bias. I strongly suggest that you analyze only companies whose
>>
>> start dates you know.
>>
>>
>>
>> As Robert said, disentangling age, period, and cohort effects can be
>>
>> challenging. Take a look at the contributed -apc- command, also
>>
>> available from -ssc-
>>
>>
>>
>> -Steve
>>
>>
>>
>> Steven J. Samuels
>>
>> sjsamuels@gmail.com
>>
>> 18 Cantine's Island
>>
>> Saugerties NY 12477
>>
>> USA
>>
>> Voice: 845-246-0774
>>
>> Fax: 206-202-4783
>>
>>
>>
>>
>>
>>
>>
>> >> ----- Original Message -----
>>
>> >> From: Oliver Eger <oliver.eger@yahoo.de>
>>
>> >> Date: Wednesday, May 26, 2010 8:14 am
>>
>> >> Subject: st: discrete hazard modells: irregular time intervals
>>
>> >> To: statalist@hsphsun2.harvard.edu
>>
>> >>
>>
>> >>
>>
>> >> > Dear Stata users,
>>
>> >> >
>>
>> >> > I ‘am new to Stata and to the newsgroup. I would kindly ask for
>>
>> >> > concerning my work on survival analysis.
>>
>> >> >
>>
>> >> > I collected company data. My data are interval censored and interval
>>
>> >> > truncated. The intervals are of irregular period in calendar time.
>>
>> >> >
>>
>> >> > Are there any methods to deal with this irregularities or at least to
>>
>> >> > estimate their influence on my survival regressions?
>>
>> >> >
>>
>> >> > Best regards,
>>
>> >> > Oliver
>>
>> >> >
>>
>>
>
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/statalist/faq
> *   http://www.ats.ucla.edu/stat/stata/
>
>
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/statalist/faq
> *   http://www.ats.ucla.edu/stat/stata/
>

--
Steven Samuels
sjsamuels@gmail.com
18 Cantine's Island
Saugerties NY 12477
USA
Voice: 845-246-0774
Fax:    206-202-4783

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
```