[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

st: Clustering and other issues with survival data for firms

From   Ioannis Ioannou <>
Subject   st: Clustering and other issues with survival data for firms
Date   Fri, 1 Aug 2008 13:21:56 -0400

Dear Stata Users,

I am very new to survival analysis, and I had a couple of questions
regarding piecewise-constant exponential models. (I am using the
stpiece command)

My data set is basically the history of an industry since it's
beginning. An observation is at the firm-year level. I am trying to
estimate the impact of an event on the hazard of exit of the firms
that do suffer the event, as opposed to the ones that do not. For each
firm, I have entry and exit (death) year and I can observe the year
when the event happens.

The two issues that I am facing are the following:

a. I have a large number of firms that exist only for one year. When I
use the "stset" command to declare the dataset for survival analysis,
those observations are dropped (i.e. _st==0). This is an issue because
these are more than 1000 firms in a sample of 2806. Note that the
event of interest does *not* happen for any of the firms that are not
used in the analysis. So basically, these are the complete failures of
the industry.

b. The second problem I have is that some firm suffer two events in
the same yea. This is an important issue, because these two events
differ on the characteristics thus, it is not clear which one to keep
and which one to drop for a given year when there are two of them.

c. My last question concerns the errors. I usually work with panel
data techniques, where errors are clustered at the firm level. Would
this be appropriate clustering for the survival analysis as well? I
could alternatively cluster on founding state or founding city. I
guess my question is whether the clustering reasoning for panel data,
transfers to survival analysis when we are talking about firms.

As a solution to problems a. and b. above, I considered transforming
the data to semi-annual (rather than annual) observations. For problem
(a) above, this would mean that I will assume that firms that failed
within a year did survive for one semester of that year. For problem
(b) I can make the simplifying assumption that one event occurs in the
first 6-month period of the year, and the second event occurs in the
second 6-month period of the year, and thus allow for two events
within one year. Thus,

d. Is the procedure I am proposing (i.e. transforming annual data to
sem-annual data) inherently wrong? Is there any econometric issue why
this should not be done? Do the assumptions that I am making seem too strong?

e. Is there any other way to fix issues (a) and (b) above?

Thank you in advance for your help.

*   For searches and help try:

© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index