Notice: On March 31, it was **announced** that Statalist is moving from an email list to a **forum**. The old list will shut down on April 23, and its replacement, **statalist.org** is already up and running.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

From |
"Oliver Eger" <oliver.eger@yahoo.de> |

To |
<statalist@hsphsun2.harvard.edu> |

Subject |
AW: st: discrete hazard modells: irregular time intervals |

Date |
Tue, 1 Jun 2010 18:58:24 +0200 |

Dear Steven, thanks for your helpful and quick comments! We will internally discuss the lifetime and apc issue! Perhaps I come back to this later... Maybe some of my questions are learner questions, but it seems to me that the best way to make progress in survival analysis is to ask them;-): 1.) If I use semi parametric analysis: do I have to pay attention to frailty also, or just when performing full parametric analysis? 2.) The pgmhaz8 or hshaz commands you talked about: I can use this only in a full parametric approach. Is this right? But I don?t want to make any assumptions about the functional form of my hazard. So, what to do? 3.) Because of the grouped data issue: can this also be done in a semiparametric/ Cox survival or a cloglog model regression? 4.) What exactly is a finite mixture model and how can I perform it in Stata? 5.) How could I weight risk by the exposure time in practice? What do I have to do therefore with my data? 6.) If I understand it correctly: there is no direct method/ approach that could be applied to the kind of irregular interval data I built-on? So, do you know anybody, doing basic research on such econometric/ methodical questions? Best regards, Oliver -----Ursprüngliche Nachricht----- Von: owner-statalist@hsphsun2.harvard.edu [mailto:owner-statalist@hsphsun2.harvard.edu] Im Auftrag von Steve Samuels Gesendet: Montag, 31. Mai 2010 13:52 An: statalist@hsphsun2.harvard.edu; bob.yaffee@nyu.edu Betreff: Re: st: discrete hazard modells: irregular time intervals Oliver privately sent Bob and me the email below. I've removed the figures that he mentions. You have defined failure as the "last year" that the company is mentioned in your source. But companies can (temporarily) disappear and re-appear for the reasons you mention. What is the maximum time that a company would have to be absent before you believe that the disappearance is permanent? Say, for the sake of argument, it is six years. Then I suggest that you end your observations in 1994, and define a disappearance as any company which has disappeared before then, because you cannot be sure company that disappears in 1995-2000 will not reappear. I also suggest that you use the time to the last disappearance defined this way as your outcome and ignore the prior gaps. This might be your best analysis under the circumstances. Also, investigate the age-period-cohort approach. Some other comments: 1. If the address books are published every two years, then a "permanent"disappearance of the kind I suggest means that you do indeed have interval censoring (in the two years since the last publication). You cannot detect disappearances and re-appearances within the two year publishing interval. Therefore, I think that you will have to assume that the company existed during the entire interval. 2. I have no solution to your left-side problem. 3. "Discrete" and "grouped" analysis will be the same only if the grouping intervals are of equal size. See: Stephen Jenkins's text Survival Analysis using Stata: http://www.iser.essex.ac.uk/survival-analysis 4. Definition of "lifetime" is an issue because of the temporary disappearances and re-appearances. Good luck. Steven -- Steven Samuels sjsamuels@gmail.com 18 Cantine's Island Saugerties NY 12477 USA Voice: 845-246-0774 Fax: 206-202-4783 On Mon, May 31, 2010 at 6:43 AM, Oliver Eger <oliver.eger@yahoo.de> wrote: > > Dear Robert, > > Dear Steven, > > > > first of all I want to thank you for your feedback, the detailed advices and the time you spend on it! > > > > I like to answer (& ask;-)) to your suggestions chronologically and as structured as possible: > > > > > > 1.) As Robert suggested, I checked chapter 5 in Singer/ Willett, Applied Longitudinal Data Analysis. If I understood correctly, in these examples there are additional time information (e.g. table 5.1: AGE, AGEGRP & WAVE). Such additional information you propose, should be used to handle my problem of irregular time intervals with. > > > > Before I continue, I should give some more insights into my data, just to avoid misunderstandings. I send you also two graphs to your email, just to illustrate. Graphs say more than words. > > > > In fact, I don?t have much information about my firms. All I have are old address books. The issues are regularly published in two year periods. But as you can see in figure 1, due to war and irregularities in publishing, there are several gaps. Let?s call this the ?systematic error?, because it is related to the source. > > > > In addition, not every company is published in all possible issues, or in others words: not every company is observable in every ?point of observability?. The reasons therefore are unknown. One could imagine, that this is due to a careless registration policy of the company concerning the address book, or that the company is not working in my particular submarket, I? am interested in. That means, that the company is likely to exist during this period(s), but on the ?main market?, or other related submarkets, in which I ?am not interested in. However it is, the company is in fact not observed. Let?s call this now the ?object error?, because it is related to the object/ company. > > > > I now did the following: > > - (i) First of all, I brought my data in the long or multi id form. > > - (ii) Every company gets as many observations as times/ years of mentioning in my address books > > - (iii) Every of these observations is defined to start at the 1st of January and ends at the 31th December. That means: if a company is mentioned in one issue of the address book, it is assumed to operate on my submarket for one complete year. I think that this is a reasonable assumption. > > - (iv) If a systematic or a object error occurs, for stata this is now a gap, because of my definitions via stset and the multi id form. The reasons for the gap are in the end unknown. No certain information are available. I thought that this is the best way to handle this. > > - (v) I defined the first entry/ first observation of each company as the first year of mentioning in my address book, exit/ failure is the last year of mentioning. Gaps are included as described above. > > > > So, what I want to say is: I ?am not sure, if I could proceed as proposed in chapter 5 Singer/ Willett, because I don´t have such additional time information in my data. Or is there a misunderstanding on my site? > > > > > > 2.) Steven is definitely right, when he says, that the data is not left truncated or censored. This is because I defined the first entry/ first observation as just described above (see (v)). So it is also true, that I have a mix of brand new and already existing companies in my first year(s). Both are in indistinguishable to me. They are indistinguishable, because the only information I have is the mentioning in 1900 or 1902, or 1904 etc., but this definitely does not allow to draw conclusion if this is also the time point of their first market entry. The company could also be active on the market in 1898 or 1896 etc. I just don?t know. > > > > > > 3.) Steve said, that according to 2.) there could be a bias. That?s definitely correct. What I would argue is the following: My database covers around 100 years. This is a long period. It includes nearly 7000 companies. Period 1 measures around 80 companies, period two around 350. The historical literature describing my submarket says, that the time of beginning of my database is accompanying the advent of the market/ industry. So I think, that from a practical point of view, the bias should not be very strong. In fact, there was not too much industry activity before the beginning of my database, or in other words, my database covers nearly the whole industry cycle. > > > > Nevertheless I would appreciate to handle my data in a formal mathematical/ methodological correct way, at least to get a clear picture of the amount of influence of this bias. > > > > The question to 2.) and 3.) is: Are there any methods available for such a data situation, particularly concerning the left side of data? > > > > > > 4.) Irregular time intervals: if you look at figure 1, these are my gaps in calendar time. In survival analysis, all companies are independent of calendar time, in fact they all start at their market entry. If I order the companies in this way, I get something like figure 2. That means that I get a situation as it is described in chapter 5.2.1 in Singer/ Willett, ?Analyzing data set in Which the Number of Waves per Person Varies?. In my data, this situation is enhanced because the ?systematic error? and the ?object? error (see above) come together. In addition, I have a quite big number of companies/ objects (~ 7000). This should converge in a more or less mutual canceling out of the errors. > > > > Nevertheless: are there any formal methods applied to survival analysis and or implemented in Stata for such irregular intervals? > > > > > > 5.) Grouped data methods: > > I installed the pgmhaz8- and ?hshaz routines and read help. > > > > Following questions with this: > > (i) Grouped data is just a synonym for discrete time? > > (ii) If yes, I could also use a clolog model/ regression instead of pgmhaz8 or hshaz? > > (iii) Both routines are for full parametric use. I just want to perform a semi parametric analysis. Are there any alternatives? > > (iv) If I use semi parametric analysis: do I have to pay attention to frailty also, or just when performing full parametric analysis? > > > > > > 6.) What exactly is a finite mixture model and how can I perform it in Stata? > > > > > > 7.) How could I weight risk by the exposure time in practice? What do I have to do therefore with my data? > > > > > > > > > > Figure 1: sketch of the points of observability in calendar time > > > > > > Figure 2: selected age groups of companies, all starting at t=0 and their respective points of observability > > > > > > > > > > > > > > > > > > > > > > -----Ursprüngliche Nachricht----- > Von: owner-statalist@hsphsun2.harvard.edu [mailto:owner-statalist@hsphsun2.harvard.edu] Im Auftrag von Steve Samuels > Gesendet: Freitag, 28. Mai 2010 01:02 > An: statalist@hsphsun2.harvard.edu > Betreff: Re: st: discrete hazard modells: irregular time intervals > > > > Oliver- > > > > Welcome to Statalist! The -search- command is a very useful tool for > > finding Stata resources. > > "search interval censoring, all" turns up -intcens-. I also agree with > > Robert about trying to simplify your data. If you can locate your > > events to single years or even decades, then you could create grouped > > follow-up intervals (they need not be of equal length) use the grouped > > data methods of and use -pgmhaz8- and -hshaz-, programs contributed by > > Stephen Jenkins, and downloadable with -ssc- . > > > > I don't see that you can use methods for left-truncation, because you > > lack the information on the start date for companies that were in > > existence prior to 1900. At time "zero" your companies will be a > > mixture of brand-new and established companies. I think this risks > > serious bias. I strongly suggest that you analyze only companies whose > > start dates you know. > > > > As Robert said, disentangling age, period, and cohort effects can be > > challenging. Take a look at the contributed -apc- command, also > > available from -ssc- > > > > -Steve > > > > Steven J. Samuels > > sjsamuels@gmail.com > > 18 Cantine's Island > > Saugerties NY 12477 > > USA > > Voice: 845-246-0774 > > Fax: 206-202-4783 > > > > > > > > >> ----- Original Message ----- > > >> From: Oliver Eger <oliver.eger@yahoo.de> > > >> Date: Wednesday, May 26, 2010 8:14 am > > >> Subject: st: discrete hazard modells: irregular time intervals > > >> To: statalist@hsphsun2.harvard.edu > > >> > > >> > > >> > Dear Stata users, > > >> > > > >> > I ?am new to Stata and to the newsgroup. I would kindly ask for advice, > > >> > concerning my work on survival analysis. > > >> > > > >> > I collected company data. My data are interval censored and interval > > >> > truncated. The intervals are of irregular period in calendar time. > > >> > > > >> > Are there any methods to deal with this irregularities or at least to > > >> > estimate their influence on my survival regressions? > > >> > > > >> > Best regards, > > >> > Oliver > > >> > > > * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/ * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

**Follow-Ups**:**Re: st: discrete hazard modells: irregular time intervals***From:*Steve Samuels <sjsamuels@gmail.com>

- Prev by Date:
**st: RE: eivreg and deming** - Next by Date:
**st: two-stage dummy variable model** - Previous by thread:
**st: Re: nlcom for elasticity in nonlinear least squares** - Next by thread:
**Re: st: discrete hazard modells: irregular time intervals** - Index(es):