Stata: Data Analysis and Statistical Software

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: discrete hazard modells: irregular time intervals

From	Steve Samuels <[email protected]>
To	[email protected], [email protected]
Subject	Re: st: discrete hazard modells: irregular time intervals
Date	Mon, 31 May 2010 07:52:02 -0400

Oliver privately  sent Bob and me the email below.  I've removed the
figures that he mentions.

You have defined failure as the "last year" that the company is
mentioned in your source.  But companies can (temporarily) disappear
and re-appear  for the reasons you mention.  What is the maximum time
that a company would have to be absent before you believe that the
disappearance is permanent?  Say, for the sake of argument, it is six
years.  Then I suggest that you end your observations in 1994, and
define a disappearance as any company which has disappeared before
then,  because you cannot be sure company that disappears in 1995-2000
will not reappear.  I also suggest that you use the time to the last
disappearance defined this way as your outcome and ignore the prior
gaps.  This might be your best analysis under the circumstances.
Also, investigate the age-period-cohort approach.



Some other comments:
1. If the address books are published every two years, then a
"permanent"disappearance of the kind I suggest means that you do
indeed have interval censoring (in the two years since the last
publication).  You cannot detect disappearances and re-appearances
within the two year publishing interval.  Therefore, I think that you
will have to assume that the company existed during the entire
interval.
2. I have no solution to your left-side problem.
3. "Discrete" and "grouped" analysis will be the same only if the
grouping intervals are  of equal size.  See: Stephen Jenkins's text
Survival Analysis using Stata:
http://www.iser.essex.ac.uk/survival-analysis
4. Definition of "lifetime" is an issue because of the temporary
disappearances and re-appearances.

Good luck.

Steven

--
Steven Samuels
[email protected]
18 Cantine's Island
Saugerties NY 12477
USA
Voice: 845-246-0774
Fax:    206-202-4783







On Mon, May 31, 2010 at 6:43 AM, Oliver Eger <[email protected]> wrote:
>
> Dear Robert,
>
> Dear Steven,
>
>
>
> first of all I want to thank you for your feedback, the detailed advices and the time you spend on it!
>
>
>
> I like to answer (& ask;-)) to your suggestions chronologically and as structured as possible:
>
>
>
>
>
> 1.)   As Robert suggested, I checked chapter 5 in Singer/ Willett,     Applied Longitudinal Data Analysis. If I understood correctly, in these examples there are additional time information (e.g. table 5.1: AGE, AGEGRP & WAVE). Such additional information you propose, should be used to handle my problem of irregular time intervals with.
>
>
>
> Before I continue, I should give some more insights into my data, just to avoid misunderstandings. I send you also two graphs to your email, just to illustrate. Graphs say more than words.
>
>
>
> In fact, I don’t have much information about my firms. All I have are old address books. The issues are regularly published in two year periods. But as you can see in figure 1, due to war and irregularities in publishing, there are several gaps. Let’s call this the “systematic error”, because it is related to the source.
>
>
>
> In addition, not every company is published in all possible issues, or in others words: not every company is observable in every “point of observability”. The reasons therefore are unknown. One could imagine, that this is due to a careless registration policy of the company concerning the address book, or that the company is not working in my particular submarket, I’ am interested in. That means, that the company is likely to exist during this period(s), but on the “main market”, or other related submarkets, in which I ‘am not interested in. However it is, the company is in fact not observed. Let’s call this now the “object error”, because it is related to the object/ company.
>
>
>
> I now did the following:
>
> -    (i)   First of all, I brought my data in the long or multi id form.
>
> -    (ii)  Every company gets as many observations as times/ years of mentioning in my address books
>
> -    (iii) Every of these observations is defined to start at the 1st of January and ends at the 31th December. That means: if a company is mentioned in one issue of the address book, it is assumed to operate on my submarket for one complete year. I think that this is a reasonable assumption.
>
> -    (iv)  If a systematic or a object error occurs, for stata this is now a gap, because of my definitions via stset and the multi id form. The reasons for the gap are in the end unknown. No certain information are available. I thought that this is the best way to handle this.
>
> -    (v)  I defined the first entry/ first observation of each company as the first year of mentioning in my address book, exit/ failure is the last year of mentioning. Gaps are included as described above.
>
>
>
> So, what I want to say is: I ‘am not sure, if I could proceed as proposed in chapter 5 Singer/ Willett, because I don´t have such additional time information in my data. Or is there a misunderstanding on my site?
>
>
>
>
>
> 2.)  Steven is definitely right, when he says, that the data is not left truncated or censored. This is because I defined the first entry/ first observation as just described above (see (v)). So it is also true, that I have a mix of brand new and already existing companies in my first year(s). Both are in indistinguishable to me. They are indistinguishable, because the only information I have is the mentioning in 1900 or 1902, or 1904 etc., but this definitely does not allow to draw conclusion if this is also the time point of their first market entry. The company could also be active on the market in 1898 or 1896 etc. I just don’t know.
>
>
>
>
>
> 3.) Steve said, that according to 2.) there could be a bias. That’s definitely correct. What I would argue is the following: My database covers around 100 years. This is a long period. It includes nearly 7000 companies. Period 1 measures around 80 companies, period two around 350. The historical literature describing my submarket says, that the time of beginning of my database is accompanying the advent of the market/ industry. So I think, that from a practical point of view, the bias should not be very strong. In fact, there was not too much industry activity before the beginning of my database, or in other words, my database covers nearly the whole industry cycle.
>
>
>
> Nevertheless I would appreciate to handle my data in a formal mathematical/ methodological correct way, at least to get a clear picture of the amount of influence of this bias.
>
>
>
> The question to 2.) and 3.) is: Are there any methods available for such a data situation, particularly concerning the left side of data?
>
>
>
>
>
> 4.) Irregular time intervals: if you look at figure 1, these are my gaps in calendar time. In survival analysis, all companies are independent of calendar time, in fact they all start at their market entry. If I order the companies in this way, I get something like figure 2. That means that I get a situation as it is described in chapter 5.2.1 in Singer/ Willett, “Analyzing data set in Which the Number of Waves per Person Varies”. In my data, this situation is enhanced because the “systematic error” and the “object” error (see above) come together. In addition, I have a quite big number of companies/ objects (~ 7000). This should converge in a more or less mutual canceling out of the errors.
>
>
>
> Nevertheless: are there any formal methods applied to survival analysis and or implemented in Stata for such irregular intervals?
>
>
>
>
>
> 5.) Grouped data methods:
>
> I installed the pgmhaz8- and –hshaz routines and read help.
>
>
>
> Following questions with this:
>
> (i)     Grouped data is just a synonym for discrete time?
>
> (ii)    If yes, I could also use a clolog model/ regression instead of  pgmhaz8 or hshaz?
>
> (iii)  Both routines are for full parametric use. I just want to perform a semi parametric analysis. Are there any alternatives?
>
> (iv)    If I use semi parametric analysis: do I have to pay attention to frailty also, or just when performing full parametric analysis?
>
>
>
>
>
> 6.) What exactly is a finite mixture model and how can I perform it in Stata?
>
>
>
>
>
> 7.) How could I weight risk by the exposure time in practice? What do I have to do therefore with my data?
>
>
>
>
>
>
>
>
>
> Figure 1: sketch of the points of observability in calendar time
>
>
>
>
>
> Figure 2: selected age groups of companies, all starting at t=0 and their respective points of observability
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
> -----Ursprüngliche Nachricht-----
> Von: [email protected] [mailto:[email protected]] Im Auftrag von Steve Samuels
> Gesendet: Freitag, 28. Mai 2010 01:02
> An: [email protected]
> Betreff: Re: st: discrete hazard modells: irregular time intervals
>
>
>
> Oliver-
>
>
>
> Welcome to Statalist! The -search- command is a very useful tool for
>
> finding Stata resources.
>
> "search interval censoring, all" turns up -intcens-. I also agree with
>
> Robert about trying to simplify your data. If you can locate your
>
> events to single years or even decades, then you could create grouped
>
> follow-up intervals (they need not be of equal length) use the grouped
>
> data methods of and use -pgmhaz8- and -hshaz-, programs contributed by
>
> Stephen Jenkins, and downloadable with -ssc- .
>
>
>
> I don't see that you can use methods for left-truncation, because you
>
> lack the information on the start date for companies that were in
>
> existence prior to 1900. At time "zero" your companies will be a
>
> mixture of brand-new and established companies. I think this risks
>
> serious bias. I strongly suggest that you analyze only companies whose
>
> start dates you know.
>
>
>
> As Robert said, disentangling age, period, and cohort effects can be
>
> challenging. Take a look at the contributed -apc- command, also
>
> available from -ssc-
>
>
>
> -Steve
>
>
>
> Steven J. Samuels
>
> [email protected]
>
> 18 Cantine's Island
>
> Saugerties NY 12477
>
> USA
>
> Voice: 845-246-0774
>
> Fax: 206-202-4783
>
>
>
>
>
>
>
> >> ----- Original Message -----
>
> >> From: Oliver Eger <[email protected]>
>
> >> Date: Wednesday, May 26, 2010 8:14 am
>
> >> Subject: st: discrete hazard modells: irregular time intervals
>
> >> To: [email protected]
>
> >>
>
> >>
>
> >> > Dear Stata users,
>
> >> >
>
> >> > I ‘am new to Stata and to the newsgroup. I would kindly ask for advice,
>
> >> > concerning my work on survival analysis.
>
> >> >
>
> >> > I collected company data. My data are interval censored and interval
>
> >> > truncated. The intervals are of irregular period in calendar time.
>
> >> >
>
> >> > Are there any methods to deal with this irregularities or at least to
>
> >> > estimate their influence on my survival regressions?
>
> >> >
>
> >> > Best regards,
>
> >> > Oliver
>
> >> >
>
>

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

References:
- st: discrete hazard modells: irregular time intervals
  - From: "Oliver Eger" <[email protected]>
- Re: st: discrete hazard modells: irregular time intervals
  - From: Robert A Yaffee <[email protected]>
- Re: st: discrete hazard modells: irregular time intervals
  - From: Robert A Yaffee <[email protected]>
- Re: st: discrete hazard modells: irregular time intervals
  - From: Steve Samuels <[email protected]>

Prev by Date: st: problems with browsing, dropping certain observations
Next by Date: st: RE: problems with browsing, dropping certain observations
Previous by thread: Re: st: discrete hazard modells: irregular time intervals
Next by thread: st: survival analysis: left censoring/ truncation
Index(es):
- Date
- Thread