Notice: On March 31, it was **announced** that Statalist is moving from an email list to a **forum**. The old list will shut down on April 23, and its replacement, **statalist.org** is already up and running.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

From |
Steve Samuels <sjsamuels@gmail.com> |

To |
statalist@hsphsun2.harvard.edu |

Subject |
Re: st: discrete hazard modells: irregular time intervals |

Date |
Tue, 1 Jun 2010 14:16:45 -0400 |

Oliver, some of these do not call for a short answer, so I'll just refer you to the references. On Tue, Jun 1, 2010 at 12:58 PM, Oliver Eger <oliver.eger@yahoo.de> wrote: > Dear Steven, > > thanks for your helpful and quick comments! > > We will internally discuss the lifetime and apc issue! Perhaps I come back > to this later... > > > Maybe some of my questions are learner questions, but it seems to me that > the best way to make progress in survival analysis is to ask them;-): > > 1.) If I use semi parametric analysis: do I have to pay attention to > frailty also, or just when performing full parametric analysis? Answer: It depends on what you mean by "semi parametric". Cox models in Stata require a group whose members share a common frailty. > > 2.) The pgmhaz8 or hshaz commands you talked about: I can use this > only in a full parametric approach. Is this right? But I don’t want to make > any assumptions about the functional form of my hazard. So, what to do? Answer: These models don't impose a functional form for the hazard, just for the hazard ratio. You might consider them "parametric", but I think they are semi-parametric. > 3.) Because of the grouped data issue: can this also be done in a > semiparametric/ Cox survival or a cloglog model regression? Answer: -cloglog- has no -frailty- option, and the frailty option -st cox- probably doesn't apply to your data. > 4.) What exactly is a finite mixture model and how can I perform it > in Stata? Answer: A mixture model is one in which the data point can arise from any one of K distributions, with distribution "k" chosen with probability p_k. "findit mixture" or "search mixture, all" will find several finite mixture programs in Stata. > 5.) How could I weight risk by the exposure time in practice? What > do I have to do therefore with my data? Answer: I'm not sure what you mean here. The survival programs do this automatically. > > 6.) If I understand it correctly: there is no direct method/ > approach that could be applied to the kind of irregular interval data I > built-on? Yes, but it's not your irregular intervals. The -intcens- program can handle interval censoring with irregular intervals.. It's the lack of definition of what you mean by "lifetime": you want to exclude time that a company was in existence, but not in your market. There is no solution to the "left-hand" problem. > So, do you know anybody, doing basic research on such econometric/ > methodical questions? I don't, but I'm not an econometrician. I'm not sure that this is even a survival problem per se. There might be some suggestions in the biostatistical literature on the appearance and recurrence of disease, with imperfect periodic screening. A Bayesian version might help cope with some of the uncertainties.. I don't think that Stata has anything to offer in this area. Good luck! Steve Steven Samuels sjsamuels@gmail.com 18 Cantine's Island Saugerties NY 12477 USA Voice: 845-246-0774 Fax: 206-202-4783 > -----Ursprüngliche Nachricht----- > Von: owner-statalist@hsphsun2.harvard.edu > [mailto:owner-statalist@hsphsun2.harvard.edu] Im Auftrag von Steve Samuels > Gesendet: Montag, 31. Mai 2010 13:52 > An: statalist@hsphsun2.harvard.edu; bob.yaffee@nyu.edu > Betreff: Re: st: discrete hazard modells: irregular time intervals > > Oliver privately sent Bob and me the email below. I've removed the > figures that he mentions. > > You have defined failure as the "last year" that the company is > mentioned in your source. But companies can (temporarily) disappear > and re-appear for the reasons you mention. What is the maximum time > that a company would have to be absent before you believe that the > disappearance is permanent? Say, for the sake of argument, it is six > years. Then I suggest that you end your observations in 1994, and > define a disappearance as any company which has disappeared before > then, because you cannot be sure company that disappears in 1995-2000 > will not reappear. I also suggest that you use the time to the last > disappearance defined this way as your outcome and ignore the prior > gaps. This might be your best analysis under the circumstances. > Also, investigate the age-period-cohort approach. > > > > Some other comments: > 1. If the address books are published every two years, then a > "permanent"disappearance of the kind I suggest means that you do > indeed have interval censoring (in the two years since the last > publication). You cannot detect disappearances and re-appearances > within the two year publishing interval. Therefore, I think that you > will have to assume that the company existed during the entire > interval. > 2. I have no solution to your left-side problem. > 3. "Discrete" and "grouped" analysis will be the same only if the > grouping intervals are of equal size. See: Stephen Jenkins's text > Survival Analysis using Stata: > http://www.iser.essex.ac.uk/survival-analysis > 4. Definition of "lifetime" is an issue because of the temporary > disappearances and re-appearances. > > Good luck. > > Steven > > - > > On Mon, May 31, 2010 at 6:43 AM, Oliver Eger <oliver.eger@yahoo.de> wrote: >> >> Dear Robert, >> >> Dear Steven, >> >> >> >> first of all I want to thank you for your feedback, the detailed advices > and the time you spend on it! >> >> >> >> I like to answer (& ask;-)) to your suggestions chronologically and as > structured as possible: >> >> >> >> >> >> 1.) As Robert suggested, I checked chapter 5 in Singer/ Willett, > Applied Longitudinal Data Analysis. If I understood correctly, in these > examples there are additional time information (e.g. table 5.1: AGE, AGEGRP > & WAVE). Such additional information you propose, should be used to handle > my problem of irregular time intervals with. >> >> >> >> Before I continue, I should give some more insights into my data, just to > avoid misunderstandings. I send you also two graphs to your email, just to > illustrate. Graphs say more than words. >> >> >> >> In fact, I don’t have much information about my firms. All I have are old > address books. The issues are regularly published in two year periods. But > as you can see in figure 1, due to war and irregularities in publishing, > there are several gaps. Let’s call this the “systematic error”, because it > is related to the source. >> >> >> >> In addition, not every company is published in all possible issues, or in > others words: not every company is observable in every “point of > observability”. The reasons therefore are unknown. One could imagine, that > this is due to a careless registration policy of the company concerning the > address book, or that the company is not working in my particular submarket, > I’ am interested in. That means, that the company is likely to exist during > this period(s), but on the “main market”, or other related submarkets, in > which I ‘am not interested in. However it is, the company is in fact not > observed. Let’s call this now the “object error”, because it is related to > the object/ company. >> >> >> >> I now did the following: >> >> - (i) First of all, I brought my data in the long or multi id form. >> >> - (ii) Every company gets as many observations as times/ years of > mentioning in my address books >> >> - (iii) Every of these observations is defined to start at the 1st of > January and ends at the 31th December. That means: if a company is mentioned > in one issue of the address book, it is assumed to operate on my submarket > for one complete year. I think that this is a reasonable assumption. >> >> - (iv) If a systematic or a object error occurs, for stata this is now > a gap, because of my definitions via stset and the multi id form. The > reasons for the gap are in the end unknown. No certain information are > available. I thought that this is the best way to handle this. >> >> - (v) I defined the first entry/ first observation of each company as > the first year of mentioning in my address book, exit/ failure is the last > year of mentioning. Gaps are included as described above. >> >> >> >> So, what I want to say is: I ‘am not sure, if I could proceed as proposed > in chapter 5 Singer/ Willett, because I don´t have such additional time > information in my data. Or is there a misunderstanding on my site? >> >> >> >> >> >> 2.) Steven is definitely right, when he says, that the data is not left > truncated or censored. This is because I defined the first entry/ first > observation as just described above (see (v)). So it is also true, that I > have a mix of brand new and already existing companies in my first year(s). > Both are in indistinguishable to me. They are indistinguishable, because the > only information I have is the mentioning in 1900 or 1902, or 1904 etc., but > this definitely does not allow to draw conclusion if this is also the time > point of their first market entry. The company could also be active on the > market in 1898 or 1896 etc. I just don’t know. >> >> >> >> >> >> 3.) Steve said, that according to 2.) there could be a bias. That’s > definitely correct. What I would argue is the following: My database covers > around 100 years. This is a long period. It includes nearly 7000 companies. > Period 1 measures around 80 companies, period two around 350. The historical > literature describing my submarket says, that the time of beginning of my > database is accompanying the advent of the market/ industry. So I think, > that from a practical point of view, the bias should not be very strong. In > fact, there was not too much industry activity before the beginning of my > database, or in other words, my database covers nearly the whole industry > cycle. >> >> >> >> Nevertheless I would appreciate to handle my data in a formal > mathematical/ methodological correct way, at least to get a clear picture of > the amount of influence of this bias. >> >> >> >> The question to 2.) and 3.) is: Are there any methods available for such a > data situation, particularly concerning the left side of data? >> >> >> >> >> >> 4.) Irregular time intervals: if you look at figure 1, these are my gaps > in calendar time. In survival analysis, all companies are independent of > calendar time, in fact they all start at their market entry. If I order the > companies in this way, I get something like figure 2. That means that I get > a situation as it is described in chapter 5.2.1 in Singer/ Willett, > “Analyzing data set in Which the Number of Waves per Person Varies”. In my > data, this situation is enhanced because the “systematic error” and the > “object” error (see above) come together. In addition, I have a quite big > number of companies/ objects (~ 7000). This should converge in a more or > less mutual canceling out of the errors. >> >> >> >> Nevertheless: are there any formal methods applied to survival analysis > and or implemented in Stata for such irregular intervals? >> >> >> >> >> >> 5.) Grouped data methods: >> >> I installed the pgmhaz8- and –hshaz routines and read help. >> >> >> >> Following questions with this: >> >> (i) Grouped data is just a synonym for discrete time? >> >> (ii) If yes, I could also use a clolog model/ regression instead of > pgmhaz8 or hshaz? >> >> (iii) Both routines are for full parametric use. I just want to perform a > semi parametric analysis. Are there any alternatives? >> >> (iv) If I use semi parametric analysis: do I have to pay attention to > frailty also, or just when performing full parametric analysis? >> >> >> >> >> >> 6.) What exactly is a finite mixture model and how can I perform it in > Stata? >> >> >> >> >> >> 7.) How could I weight risk by the exposure time in practice? What do I > have to do therefore with my data? >> >> >> >> >> >> >> >> >> >> Figure 1: sketch of the points of observability in calendar time >> >> >> >> >> >> Figure 2: selected age groups of companies, all starting at t=0 and their > respective points of observability >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> -----Ursprüngliche Nachricht----- >> Von: owner-statalist@hsphsun2.harvard.edu > [mailto:owner-statalist@hsphsun2.harvard.edu] Im Auftrag von Steve Samuels >> Gesendet: Freitag, 28. Mai 2010 01:02 >> An: statalist@hsphsun2.harvard.edu >> Betreff: Re: st: discrete hazard modells: irregular time intervals >> >> >> >> Oliver- >> >> >> >> Welcome to Statalist! The -search- command is a very useful tool for >> >> finding Stata resources. >> >> "search interval censoring, all" turns up -intcens-. I also agree with >> >> Robert about trying to simplify your data. If you can locate your >> >> events to single years or even decades, then you could create grouped >> >> follow-up intervals (they need not be of equal length) use the grouped >> >> data methods of and use -pgmhaz8- and -hshaz-, programs contributed by >> >> Stephen Jenkins, and downloadable with -ssc- . >> >> >> >> I don't see that you can use methods for left-truncation, because you >> >> lack the information on the start date for companies that were in >> >> existence prior to 1900. At time "zero" your companies will be a >> >> mixture of brand-new and established companies. I think this risks >> >> serious bias. I strongly suggest that you analyze only companies whose >> >> start dates you know. >> >> >> >> As Robert said, disentangling age, period, and cohort effects can be >> >> challenging. Take a look at the contributed -apc- command, also >> >> available from -ssc- >> >> >> >> -Steve >> >> >> >> Steven J. Samuels >> >> sjsamuels@gmail.com >> >> 18 Cantine's Island >> >> Saugerties NY 12477 >> >> USA >> >> Voice: 845-246-0774 >> >> Fax: 206-202-4783 >> >> >> >> >> >> >> >> >> ----- Original Message ----- >> >> >> From: Oliver Eger <oliver.eger@yahoo.de> >> >> >> Date: Wednesday, May 26, 2010 8:14 am >> >> >> Subject: st: discrete hazard modells: irregular time intervals >> >> >> To: statalist@hsphsun2.harvard.edu >> >> >> >> >> >> >> >> >> > Dear Stata users, >> >> >> > >> >> >> > I ‘am new to Stata and to the newsgroup. I would kindly ask for > advice, >> >> >> > concerning my work on survival analysis. >> >> >> > >> >> >> > I collected company data. My data are interval censored and interval >> >> >> > truncated. The intervals are of irregular period in calendar time. >> >> >> > >> >> >> > Are there any methods to deal with this irregularities or at least to >> >> >> > estimate their influence on my survival regressions? >> >> >> > >> >> >> > Best regards, >> >> >> > Oliver >> >> >> > >> >> > > * > * For searches and help try: > * http://www.stata.com/help.cgi?search > * http://www.stata.com/support/statalist/faq > * http://www.ats.ucla.edu/stat/stata/ > > > * > * For searches and help try: > * http://www.stata.com/help.cgi?search > * http://www.stata.com/support/statalist/faq > * http://www.ats.ucla.edu/stat/stata/ > -- Steven Samuels sjsamuels@gmail.com 18 Cantine's Island Saugerties NY 12477 USA Voice: 845-246-0774 Fax: 206-202-4783 * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

**References**:**AW: st: discrete hazard modells: irregular time intervals***From:*"Oliver Eger" <oliver.eger@yahoo.de>

- Prev by Date:
**st: RE: RE: RE: Placing results in a new stata sheet** - Next by Date:
**st: RE: Graph dot labels** - Previous by thread:
**AW: st: discrete hazard modells: irregular time intervals** - Next by thread:
**st: two-stage dummy variable model** - Index(es):