>> Home >> Products >> Stata 14 >> Panel-data survival models

# Panel-data survival models

## Highlights

• Survival outcomes
• Random effects (intercepts)
• Random coefficients
• Multilevel models
• Right-censoring
• Exponential, loglogistic, Weibull, lognormal, gamma, survival distributions
• Sampling weights
• Graphs of marginal survivor, cumulative hazard, and hazard functions
• Fully integrated with stset
• Fully integrated with xtset

Survival models concern time-to-event outcomes. The outcomes can be anything: death, myopia, employment, etc. The outcomes can be good or bad, such as recovery or relapse, or marriage or divorce, which is worth mentioning because the jargon of survival analysis suggests the outcomes are unpleasant. The word survival itself suggests time until death.

The data on which survival models are fit are often right-censored. Data are collected for a while and, as of some date, data collection ends before everyone has "failed".

Two types of survival models are popular: semiparametric and parametric. Semiparametric means Cox proportional hazards. Parametric means a distributional assumption is made, typically exponential, Weibull, lognormal, conditional log log, etc.

Stata has a new command for fitting parametric survival models with panel data. Panel data concerns repeated observations of the primary analysis unit. For instance, let's assume we are analyzing data on individuals. Obviously, in survival data, we have repeated observations on the same person because we observed them over a period of time, from onset of risk until failure or the calling off of the data collection effort. Sometimes the multiple observations on a person are explicit; the data themselves contain multiple observations for some or all the individuals. That happens when covariates change over time. Other times, the multiple observations on the individuals are implicit; there is only one physical observation for each, but still that observation records a span of time.

Those kinds of repeated observations have nothing to do with panel data. Panel data arises, for instance, when individuals are from different countries and it was believed that country affects survival. In that case, in a panel-data model, there would be a random effect or, if you prefer, an unobserved latent effect for each country.

We can, however, write models in which the random effect occurs at the individual level if we have repeated failure events for them.

Panel-data random effects are similar to frailty, a survival-analysis concept. In frailty, related observations (individuals) are grouped and viewed as sharing a latent component. Stata allows for frailty; see the manual entries [ST] streg and [ST] stcox.

Panel-data random effects are assumed to be normally distributed and that is a selling point of this model. Frailty is assumed to be gamma distributed, and that is mainly for computational rather than substantive reasons. Panel-data's normal random effects are a more plausible assumption. They are equivalent to lognormal frailties, if you care.

Panel-data normally distributed random effects are available only with the parametric survival estimators.

Gamma distribution frailty is available with parametric and semiparametric models.

Stata 14 now provides panel-data parametric survival models.

Examples of survival outcomes in panel data are the number of years until a new recession occurs for a group of countries that belong to different regions, or weeks unemployed for individuals who might experience multiple unemployment episodes.

## Let's see it work

We want to study the duration of job position for a group of 201 people. We have 600 observations in our data, meaning roughly three job positions per person. In these data, the end of a job position could mean the end of employment, but usually it means moving to a new job, whether in the same firm or a new firm. Our outcome is time to the "end" of a job (variable tend), and variable failure indicates whether that time corresponds to censoring or the job position having ended. These are real data.

To use Stata's new xtstreg, we must first stset and xtset our data because xtstreg is both an st and xt command.

We type

. stset tend, origin(tstart) failure(failure)

failure event:  failure != 0 & failure < .
obs. time interval:  (origin, tend]
exit on or before:  failure
t for analysis:  (time-origin)
origin:  time tstart

600  total observations
0  exclusions

600  observations remaining, representing
458  failures in single-record/single-failure data
40782  total analysis time at risk and under observation
at risk from t =         0
earliest observed entry t =         0
last observed exit t =       428

. xtset id
panel variable:  id (unbalanced)


We model the time to end of job position as being determined by highest level of education attained, whether college degree was attained, number of previous jobs or job positions, prestige of the job, and gender. We use a Weibull distribution for survival times.

. xtstreg education njobs prestige female college, distribution(weibull)

failure _d:  failure
analysis time _t:  (tend-origin)
origin:  time tstart

Random-effects Weibull regression               Number of obs     =        600
Group variable:              id                 Number of groups  =        201
t
Obs per group:
min =          1
avg =        3.0
max =          9

Integration method: mvaghermite                 Integration pts.  =         12

Wald chi2(5)      =     229.16
Log likelihood = -744.15593                     Prob > chi2       =     0.0000

_t   Haz. Ratio   Std. Err.      z    P>|z|     [95% Conf. Interval]

education    1.008175   .0357436     0.23   0.818      .940498    1.080723
njobs    .9010315   .0449329    -2.09   0.037     .8171315    .9935459
prestige     .968806   .0063893    -4.81   0.000     .9563637    .9814101
female    2.683059   .4265417     6.21   0.000     1.964761     3.66396
college    3.470637   .3097446    13.94   0.000      2.91368    4.134058
_cons    .0020674   .0010752   -11.89   0.000      .000746    .0057295

/ln_p    .2425708   .0454668     5.34   0.000     .1534575     .331684

/sigma2_u    .4865297     .13979                      .2770395    .8544312

LR test vs. Weibull model: chibar2(01) = 30.74        Prob >= chibar2 = 0.0000


The number of previous jobs and the prestige of the current job both increase survival time in the current job or, said differently, reduce current job mobility. In addition, women and those with higher levels of education are more mobile.

The variance of the random effect reported is 0.49, and for your information, that variance leads to reasonably large changes in survival time.

## Let's see it work with random coefficients

Also new to Stata 14 is mestreg, which will fit the same models as the new and just demonstrated xtstreg, and more besides. Among the additional features, mestreg will allow more than one nesting level. Another additional feature is that it will fit random intercepts and random coefficients. The me part of mestreg stands for mixed effects.

Nothing is free; mestreg has a bit more syntax. To obtain the same results we just obtained, we would type

. mestreg education njobs prestige female college || id:,
distribution(weibull)


The double bars followed by id: specify that the group level is variable id, meaning observations with the same value of id share a common effect. The default effect is a random intercept.

We could estimate a random coefficient in addition by typing

. mestreg education njobs prestige female college || id: college,
distribution(weibull)


Adding a variable name after id: specifies that the variable is to have a random coefficient.

Let's fit that model

. mestreg education njobs prestige i.female college || id: college,
distribution(weibull)

failure _d:  failure
analysis time _t:  (tend-origin)
origin:  time tstart

Mixed-effects Weibull regression                Number of obs     =        600
Group variable:              id                 Number of groups  =        201

Obs per group:
min =          1
avg =        3.0
max =          9

Integration method: mvaghermite                 Integration pts.  =          7

Wald chi2(5)      =     214.80
Log likelihood = -743.87893                     Prob > chi2       =     0.0000

_t   Haz. Ratio   Std. Err.      z    P>|z|     [95% Conf. Interval]

education    1.018655   .0400041     0.47   0.638       .94319    1.100158
njobs    .9059486   .0460996    -1.94   0.052     .8199545    1.000962
prestige    .9682886    .006471    -4.82   0.000     .9556883     .981055
female    2.793355   .4725027     6.07   0.000     2.005133    3.891427
college    3.504569   .3223497    13.63   0.000     2.926451    4.196894
_cons    .0017681   .0010096   -11.10   0.000     .0005774    .0054146

/ln_p    .2493795   .0465699     5.35   0.000     .1581041    .3406549

id
var(college)   .0448831    .063607                      .0027912    .7217394
var(_cons)   .4024507   .1793122                       .168058    .9637538

LR test vs. Weibull model: chi2(2) = 31.29                Prob > chi2 = 0.0000

Note: LR test is conservative and provided only for reference.


We find that the coefficient on college has mean 3.5 and standard deviation 0.21 (obtained by taking the square root of 0.045). The coefficient is assumed to be normally distributed and unrelated to the random effect for the intercept. For roughly 95% of data, the various values of the coefficient lie between 3.08 and 3.92, which we obtained by calculating 3.5 plus or minus 2*0.21.

## Tell me more

Read more about panel-data survival models in Stata Longitudinal-Data/Panel-Data Reference Manual; see [XT] xtstreg

You can also read more about multilevel survival models in the Stata 14 announcement or in Stata Multilevel Mixed-Effects Reference Manual; see [ME] mestreg.