Home  /  Products  /  Features  /  Panel-data survival models

<-  See Stata's other features

Highlights

  • Survival outcomes

  • Random intercepts and random coefficients

  • Multilevel models

  • Right-censoring

  • Exponential, loglogistic, Weibull, lognormal, gamma, survival distributions

  • Sampling weights

  • Graphs of marginal survivor, cumulative hazard, and hazard functions

  • Fully integrated with stset

  • Fully integrated with xtset

Survival models concern time-to-event outcomes. The outcomes can be anything: death, myopia, employment, etc. The outcomes can be good or bad, such as recovery or relapse, or marriage or divorce, which is worth mentioning because the jargon of survival analysis suggests the outcomes are unpleasant. The word survival itself suggests time until death.

The data on which survival models are fit are often right-censored. Data are collected for a while and, as of some date, data collection ends before everyone has "failed". Two types of survival models are popular for right-censord data: semiparametric and parametric. Semiparametric means Cox proportional hazards. Parametric means a distributional assumption is made, typically exponential, Weibull, lognormal, conditional log log, etc.

Panel data concerns repeated observations of the primary analysis unit. For instance, let's assume we are analyzing data on individuals. Obviously, in survival data, we have repeated observations on the same person because we observed them over a period of time, from onset of risk until failure or the calling off of the data collection effort. Sometimes the multiple observations on a person are explicit; the data themselves contain multiple observations for some or all the individuals. That happens when covariates change over time. Other times, the multiple observations on the individuals are implicit; there is only one physical observation for each, but still that observation records a span of time.

Those kinds of repeated observations have nothing to do with panel data. Panel data arises, for instance, when individuals are from different countries and it was believed that country affects survival. In that case, in a panel-data model, there would be a random effect or, if you prefer, an unobserved latent effect for each country.

We can, however, write models in which the random effect occurs at the individual level if we have repeated failure events for them.

Panel-data random effects are similar to frailty, a survival-analysis concept. In frailty, related observations (individuals) are grouped and viewed as sharing a latent component. Stata offers gamma- or inverse-Gaussian-distributed frailty for parametric models, and gamma-distributed frailty for semiparametric models; see the manual entries [ST] streg and [ST] stcox. Panel-data random effects are assumed to be normally distributed and are available with parametric survival models. Frailty is assumed to be gamma- or inverse-Gaussian distributed, and that is mainly for computational rather than substantive reasons. Panel-data's normal random effects are a more plausible assumption. They are equivalent to lognormal frailties.

Stata provides two commands, xtstreg and mestreg, for fitting parametric survival models with panel-data. Examples of survival outcomes in panel data are the number of years until a new recession occurs for a group of countries that belong to different regions, or unemployed weeks for individuals who might experience multiple unemployment episodes.

Let's see it work

We want to study the duration of job position for a group of 201 people. We have 600 observations in our data, meaning roughly three job positions per person. In these data, the end of a job position could mean the end of employment, but usually it means moving to a new job, whether in the same firm or a new firm. Our outcome is time to the "end" of a job (variable tend), and variable failure indicates whether that time corresponds to censoring or the job position having ended. These are real data.

To use Stata's xtstreg, we must first stset and xtset our data because xtstreg is both an st and xt command.

We type

. stset tend, origin(tstart) failure(failure)

Survival-time data settings

         Failure event: failure!=0 & failure<.
Observed time interval: (origin, tend]
     Exit on or before: failure
     Time for analysis: (time-origin)
                Origin: time tstart

600 total observations
0 exclusions
600 observations remaining, representing
458 failures in single-record/single-failure data
40,782 total analysis time at risk and under observation
At risk from t = 0
Earliest observed entry t = 0
Last observed exit t = 428
. xtset id Panel variable: id (unbalanced)

We model the time to end of job position as being determined by highest level of education attained (education), number of previous jobs or job positions (njobs), prestige of the job (prestige), gender (female), and whether college degree was attained (college). We use a Weibull distribution for survival times.

. xtstreg education njobs prestige female college, distribution(weibull)

        Failure _d: failure
  Analysis time _t: (tend-origin)
            Origin: time tstart

Random-effects Weibull PH regression            Number of obs     =        600
Group variable: id                              Number of groups  =        201

                                                Obs per group:
                                                              min =          1
                                                              avg =        3.0
                                                              max =          9

Integration method: mvaghermite                 Integration pts.  =         12

                                                Wald chi2(5)      =     229.16
Log likelihood = -2320.2079                     Prob > chi2       =     0.0000
_t Haz. ratio Std. err. z P>|z| [95% conf. interval]
education 1.008176 .0357434 0.23 0.818 .9404984 1.080723
njobs .9010326 .0449328 -2.09 0.037 .8171328 .9935468
prestige .9688059 .0063893 -4.81 0.000 .9563637 .98141
female 2.683054 .4265383 6.21 0.000 1.964761 3.663947
college 3.470632 .3097432 13.94 0.000 2.913677 4.13405
_cons .0020674 .0010752 -11.89 0.000 .000746 .0057296
/ln_p .2425694 .0454666 .1534565 .3316824
/sigma2_u .4865182 .1397864 .2770332 .8544101
Note: Estimates are transformed only in the first equation to hazard ratios. Note: _cons estimates baseline hazard (conditional on zero random effects). LR test vs. Weibull model: chibar2(01) = 30.74 Prob >= chibar2 = 0.0000

The number of previous jobs and the prestige of the current job both increase survival time in the current job or, said differently, reduce current job mobility. In addition, women and those obtained college degree are more mobile.

The variance of the random effect reported is 0.49, and for your information, that variance leads to reasonably large changes in survival time.

Let's see it work with random coefficients

Stata also has mestreg, which will fit the same models as the just demonstrated xtstreg, and more besides. Among the additional features, mestreg will allow more than one nesting level. Another additional feature is that it will fit random intercepts and random coefficients. The me part of mestreg stands for mixed effects.

Nothing is free; mestreg has a bit more syntax. To obtain the same results we just obtained, we would type

. mestreg education njobs prestige female college || id:,
     distribution(weibull)

The double bars followed by id: specify that the group level is variable id, meaning observations with the same value of id share a common effect. The default effect is a random intercept.

We could estimate a random coefficient in addition by typing

. mestreg education njobs prestige female college || id: college,
     distribution(weibull)

Adding a variable name after id: specifies that the variable is to have a random coefficient.

Let's fit that model

. mestreg education njobs prestige i.female college || id: college,
     distribution(weibull)

        Failure _d: failure
  Analysis time _t: (tend-origin)
            Origin: time tstart

Mixed-effects Weibull PH regression             Number of obs     =        600
Group variable: id                              Number of groups  =        201

                                                Obs per group:
                                                              min =          1
                                                              avg =        3.0
                                                              max =          9

Integration method: mvaghermite                 Integration pts.  =          7

                                                Wald chi2(5)      =     214.80
Log likelihood = -2319.931                      Prob > chi2       =     0.0000
_t Haz. ratio Std. err. z P>|z| [95% conf. interval]
education 1.018654 .0400041 0.47 0.638 .9431887 1.100157
njobs .9059465 .0460997 -1.94 0.052 .8199523 1.000959
prestige .9682887 .006471 -4.82 0.000 .9556884 .9810551
1.female 2.793345 .4725018 6.07 0.000 2.005126 3.891416
college 3.504567 .3223489 13.63 0.000 2.926451 4.196891
_cons .0017682 .0010096 -11.10 0.000 .0005774 .0054147
/ln_p .2493801 .0465699 .1581048 .3406554
id
var(college) .0448765 .0636064 .0027897 .7219117
var(_cons) .4024694 .1793156 .1680698 .9637757
Note: Estimates are transformed only in the first equation to hazard ratios. Note: _cons estimates baseline hazard (conditional on zero random effects). LR test vs. Weibull model: chi2(2) = 31.29 Prob > chi2 = 0.0000 Note: LR test is conservative and provided only for reference.

We assumed a random coefficient for college, that is, the effect of college education on the decision of job mobility varies among individuals. The coefficient is assumed to be normally distributed and unrelated to the random effect for the intercept.

Our mean estimate for this coefficient is ln(3.5) = 1.25, and the variance estimate for this random coefficient is .045. In short, according to the model, the coefficient for college is normally distributed with mean 1.26, and standard deviation sqrt(.045) = .21.

Tell me more

Read more about panel-data survival models in Stata Longitudinal-Data/Panel-Data Reference Manual; see [XT] xtstreg.

You can also read more about multilevel survival models in the Stata Multilevel Mixed-Effects Reference Manual; see [ME] mestreg.