Survival models concern time-to-event outcomes. The outcomes can be anything: death, myopia, employment, etc. The outcomes can be good or bad, such as recovery or relapse, or marriage or divorce, which is worth mentioning because the jargon of survival analysis suggests the outcomes are unpleasant. The word survival itself suggests time until death.
The data on which survival models are fit are often right-censored. Data are collected for a while and, as of some date, data collection ends before everyone has "failed".
Two types of survival models are popular: semiparametric and parametric. Semiparametric means Cox proportional hazards. Parametric means a distributional assumption is made, typically exponential, Weibull, lognormal, conditional log log, etc.
Stata has a command for fitting parametric survival models with panel data. Panel data concerns repeated observations of the primary analysis unit. For instance, let's assume we are analyzing data on individuals. Obviously, in survival data, we have repeated observations on the same person because we observed them over a period of time, from onset of risk until failure or the calling off of the data collection effort. Sometimes the multiple observations on a person are explicit; the data themselves contain multiple observations for some or all the individuals. That happens when covariates change over time. Other times, the multiple observations on the individuals are implicit; there is only one physical observation for each, but still that observation records a span of time.
Those kinds of repeated observations have nothing to do with panel data. Panel data arises, for instance, when individuals are from different countries and it was believed that country affects survival. In that case, in a panel-data model, there would be a random effect or, if you prefer, an unobserved latent effect for each country.
We can, however, write models in which the random effect occurs at the individual level if we have repeated failure events for them.
Panel-data random effects are similar to frailty, a survival-analysis concept. In frailty, related observations (individuals) are grouped and viewed as sharing a latent component. Stata allows for frailty; see the manual entries [ST] streg and [ST] stcox.
Panel-data random effects are assumed to be normally distributed and that is a selling point of this model. Frailty is assumed to be gamma distributed, and that is mainly for computational rather than substantive reasons. Panel-data's normal random effects are a more plausible assumption. They are equivalent to lognormal frailties, if you care.
Panel-data normally distributed random effects are available only with the parametric survival estimators.
Gamma distribution frailty is available with parametric and semiparametric models.
Stata provides panel-data parametric survival models.
Examples of survival outcomes in panel data are the number of years until a new recession occurs for a group of countries that belong to different regions, or weeks unemployed for individuals who might experience multiple unemployment episodes.
We want to study the duration of job position for a group of 201 people. We have 600 observations in our data, meaning roughly three job positions per person. In these data, the end of a job position could mean the end of employment, but usually it means moving to a new job, whether in the same firm or a new firm. Our outcome is time to the "end" of a job (variable tend), and variable failure indicates whether that time corresponds to censoring or the job position having ended. These are real data.
To use Stata's xtstreg, we must first stset and xtset our data because xtstreg is both an st and xt command.
. stset tend, origin(tstart) failure(failure) Survival-time data settings Failure event: failure!=0 & failure<. Observed time interval: (origin, tend] Exit on or before: failure Time for analysis: (time-origin) Origin: time tstart
|600 total observations|
|600 observations remaining, representing|
|458 failures in single-record/single-failure data|
|40,782 total analysis time at risk and under observation|
|At risk from t = 0|
|Earliest observed entry t = 0|
|Last observed exit t = 428|
We model the time to end of job position as being determined by highest level of education attained, whether college degree was attained, number of previous jobs or job positions, prestige of the job, and gender. We use a Weibull distribution for survival times.
. xtstreg education njobs prestige female college, distribution(weibull) failure _d: failure analysis time _t: (tend-origin) origin: time tstart Random-effects Weibull PH regression Number of obs = 600 Group variable: id Number of groups = 201 Obs per group: min = 1 avg = 3.0 max = 9 Integration method: mvaghermite Integration pts. = 12 Wald chi2(5) = 229.16 Log likelihood = -744.15593 Prob > chi2 = 0.0000
|_t||Haz. ratio Std. err. z P>|z| [95% conf. interval]|
|education||1.008175 .0357436 0.23 0.818 .940498 1.080723|
|njobs||.9010315 .0449329 -2.09 0.037 .8171315 .9935459|
|prestige||.968806 .0063893 -4.81 0.000 .9563637 .9814101|
|female||2.683059 .4265417 6.21 0.000 1.964761 3.66396|
|college||3.470637 .3097446 13.94 0.000 2.91368 4.134058|
|_cons||.0020674 .0010752 -11.89 0.000 .000746 .0057295|
|/ln_p||.2425708 .0454668 5.34 0.000 .1534575 .331684|
|/sigma2_u||.4865297 .13979 .2770395 .8544312|
The number of previous jobs and the prestige of the current job both increase survival time in the current job or, said differently, reduce current job mobility. In addition, women and those with higher levels of education are more mobile.
The variance of the random effect reported is 0.49, and for your information, that variance leads to reasonably large changes in survival time.
Stata also has mestreg, which will fit the same models as the just demonstrated xtstreg, and more besides. Among the additional features, mestreg will allow more than one nesting level. Another additional feature is that it will fit random intercepts and random coefficients. The me part of mestreg stands for mixed effects.
Nothing is free; mestreg has a bit more syntax. To obtain the same results we just obtained, we would type
. mestreg education njobs prestige female college || id:, distribution(weibull)
The double bars followed by id: specify that the group level is variable id, meaning observations with the same value of id share a common effect. The default effect is a random intercept.
We could estimate a random coefficient in addition by typing
. mestreg education njobs prestige female college || id: college, distribution(weibull)
Adding a variable name after id: specifies that the variable is to have a random coefficient.
Let's fit that model
. mestreg education njobs prestige i.female college || id: college, distribution(weibull) Failure _d: failure Analysis time _t: (tend-origin) Origin: time tstart Mixed-effects Weibull PH regression Number of obs = 600 Group variable: id Number of groups = 201 Obs per group: min = 1 avg = 3.0 max = 9 Integration method: mvaghermite Integration pts. = 7 Wald chi2(5) = 214.80 Log likelihood = -743.87893 Prob > chi2 = 0.0000
|_t||Haz. ratio Std. err. z P>|z| [95% conf. interval]|
|education||1.018655 .0400041 0.47 0.638 .94319 1.100158|
|njobs||.9059486 .0460996 -1.94 0.052 .8199545 1.000962|
|prestige||.9682886 .006471 -4.82 0.000 .9556883 .981055|
|female||2.793355 .4725027 6.07 0.000 2.005133 3.891427|
|college||3.504569 .3223497 13.63 0.000 2.926451 4.196894|
|_cons||.0017681 .0010096 -11.10 0.000 .0005774 .0054146|
|/ln_p||.2493795 .0465699 5.35 0.000 .1581041 .3406549|
|var(college)||.0448831 .063607 .0027912 .7217394|
|var(_cons)||.4024507 .1793122 .168058 .9637538|
We assumed a random coefficient for college, that is, the effect of college education on the decision of job mobility varies among individuals. The coefficient is assumed to be normally distributed and unrelated to the random effect for the intercept.
Our mean estimate for this coefficient is ln(3.5) = 1.26, and the variance estimate for this random coefficient is .045. In short, according to the model, the coefficient for college is normally distributed with mean 1.26, and standard deviation sqrt(.045) = .21.
Read more about panel-data survival models in Stata Longitudinal-Data/Panel-Data Reference Manual; see [XT] xtstreg