Panel-data survival models

Order

Watch video demo

<- See Stata's other features

Highlights

Survival outcomes
Random intercepts and random coefficients
Multilevel models
Right-censoring
Exponential, loglogistic, Weibull, lognormal, gamma, survival distributions
Sampling weights
Graphs of marginal survivor, cumulative hazard, and hazard functions
Fully integrated with stset
Fully integrated with xtset

Survival models concern time-to-event outcomes. The outcomes can be anything: death, myopia, employment, etc. The outcomes can be good or bad, such as recovery or relapse, or marriage or divorce, which is worth mentioning because the jargon of survival analysis suggests the outcomes are unpleasant. The word survival itself suggests time until death.

The data on which survival models are fit are often right-censored. Data are collected for a while and, as of some date, data collection ends before everyone has "failed". Two types of survival models are popular for right-censord data: semiparametric and parametric. Semiparametric means Cox proportional hazards. Parametric means a distributional assumption is made, typically exponential, Weibull, lognormal, conditional log log, etc.

Panel data concerns repeated observations of the primary analysis unit. For instance, let's assume we are analyzing data on individuals. Obviously, in survival data, we have repeated observations on the same person because we observed them over a period of time, from onset of risk until failure or the calling off of the data collection effort. Sometimes the multiple observations on a person are explicit; the data themselves contain multiple observations for some or all the individuals. That happens when covariates change over time. Other times, the multiple observations on the individuals are implicit; there is only one physical observation for each, but still that observation records a span of time.

Those kinds of repeated observations have nothing to do with panel data. Panel data arises, for instance, when individuals are from different countries and it was believed that country affects survival. In that case, in a panel-data model, there would be a random effect or, if you prefer, an unobserved latent effect for each country.

We can, however, write models in which the random effect occurs at the individual level if we have repeated failure events for them.

Panel-data random effects are similar to frailty, a survival-analysis concept. In frailty, related observations (individuals) are grouped and viewed as sharing a latent component. Stata offers gamma- or inverse-Gaussian-distributed frailty for parametric models, and gamma-distributed frailty for semiparametric models; see the manual entries [ST] streg and [ST] stcox. Panel-data random effects are assumed to be normally distributed and are available with parametric survival models. Frailty is assumed to be gamma- or inverse-Gaussian distributed, and that is mainly for computational rather than substantive reasons. Panel-data's normal random effects are a more plausible assumption. They are equivalent to lognormal frailties.

Stata provides two commands, xtstreg and mestreg, for fitting parametric survival models with panel-data. Examples of survival outcomes in panel data are the number of years until a new recession occurs for a group of countries that belong to different regions, or unemployed weeks for individuals who might experience multiple unemployment episodes.

Let's see it work

We want to study the duration of job position for a group of 201 people. We have 600 observations in our data, meaning roughly three job positions per person. In these data, the end of a job position could mean the end of employment, but usually it means moving to a new job, whether in the same firm or a new firm. Our outcome is time to the "end" of a job (variable tend), and variable failure indicates whether that time corresponds to censoring or the job position having ended. These are real data.

To use Stata's xtstreg, we must first stset and xtset our data because xtstreg is both an st and xt command.

We type

. stset tend, origin(tstart) failure(failure)

Survival-time data settings

         Failure event: failure!=0 & failure<.
Observed time interval: (origin, tend]
     Exit on or before: failure
     Time for analysis: (time-origin)
                Origin: time tstart


 
        600  total observations
          0  exclusions
 
        600  observations remaining, representing
        458  failures in single-record/single-failure data
     40,782  total analysis time at risk and under observation
                                                At risk from t =         0
                                     Earliest observed entry t =         0
                                          Last observed exit t =       428


. xtset id

Panel variable: id (unbalanced)

We model the time to end of job position as being determined by highest level of education attained (education), number of previous jobs or job positions (njobs), prestige of the job (prestige), gender (female), and whether college degree was attained (college). We use a Weibull distribution for survival times.

. xtstreg education njobs prestige female college, distribution(weibull)

        Failure _d: failure
  Analysis time _t: (tend-origin)
            Origin: time tstart

Random-effects Weibull PH regression            Number of obs     =        600
Group variable: id                              Number of groups  =        201

                                                Obs per group:
                                                              min =          1
                                                              avg =        3.0
                                                              max =          9

Integration method: mvaghermite                 Integration pts.  =         12

                                                Wald chi2(5)      =     229.16
Log likelihood = -2320.2079                     Prob > chi2       =     0.0000


          _t   Haz. ratio   Std. err.      z    P>|z|     [95% conf. interval]

   education    1.008176   .0357434     0.23   0.818     .9404984    1.080723
       njobs    .9010326   .0449328    -2.09   0.037     .8171328    .9935468
    prestige    .9688059   .0063893    -4.81   0.000     .9563637      .98141
      female    2.683054   .4265383     6.21   0.000     1.964761    3.663947
     college    3.470632   .3097432    13.94   0.000     2.913677     4.13405
       _cons    .0020674   .0010752   -11.89   0.000      .000746    .0057296

       /ln_p    .2425694   .0454666                      .1534565    .3316824

   /sigma2_u    .4865182   .1397864                      .2770332    .8544101


Note: Estimates are transformed only in the first equation to hazard ratios.
Note: _cons estimates baseline hazard (conditional on zero random effects).
LR test vs. Weibull model: chibar2(01) = 30.74        Prob >= chibar2 = 0.0000

The number of previous jobs and the prestige of the current job both increase survival time in the current job or, said differently, reduce current job mobility. In addition, women and those obtained college degree are more mobile.

The variance of the random effect reported is 0.49, and for your information, that variance leads to reasonably large changes in survival time.

Let's see it work with random coefficients

Stata also has mestreg, which will fit the same models as the just demonstrated xtstreg, and more besides. Among the additional features, mestreg will allow more than one nesting level. Another additional feature is that it will fit random intercepts and random coefficients. The me part of mestreg stands for mixed effects.

Nothing is free; mestreg has a bit more syntax. To obtain the same results we just obtained, we would type

. mestreg education njobs prestige female college || id:,
     distribution(weibull)

The double bars followed by id: specify that the group level is variable id, meaning observations with the same value of id share a common effect. The default effect is a random intercept.

We could estimate a random coefficient in addition by typing

. mestreg education njobs prestige female college || id: college,
     distribution(weibull)

Adding a variable name after id: specifies that the variable is to have a random coefficient.

Let's fit that model

. mestreg education njobs prestige i.female college || id: college,
     distribution(weibull)

        Failure _d: failure
  Analysis time _t: (tend-origin)
            Origin: time tstart

Mixed-effects Weibull PH regression             Number of obs     =        600
Group variable: id                              Number of groups  =        201

                                                Obs per group:
                                                              min =          1
                                                              avg =        3.0
                                                              max =          9

Integration method: mvaghermite                 Integration pts.  =          7

                                                Wald chi2(5)      =     214.80
Log likelihood = -2319.931                      Prob > chi2       =     0.0000


          _t   Haz. ratio   Std. err.      z    P>|z|     [95% conf. interval]

   education    1.018654   .0400041     0.47   0.638     .9431887    1.100157
       njobs    .9059465   .0460997    -1.94   0.052     .8199523    1.000959
    prestige    .9682887    .006471    -4.82   0.000     .9556884    .9810551
    1.female    2.793345   .4725018     6.07   0.000     2.005126    3.891416
     college    3.504567   .3223489    13.63   0.000     2.926451    4.196891
       _cons    .0017682   .0010096   -11.10   0.000     .0005774    .0054147

       /ln_p    .2493801   .0465699                      .1581048    .3406554

id           
 var(college)    .0448765   .0636064                      .0027897    .7219117
   var(_cons)    .4024694   .1793156                      .1680698    .9637757


Note: Estimates are transformed only in the first equation to hazard ratios.
Note: _cons estimates baseline hazard (conditional on zero random effects).
LR test vs. Weibull model: chi2(2) = 31.29                Prob > chi2 = 0.0000

Note: LR test is conservative and provided only for reference.

We assumed a random coefficient for college, that is, the effect of college education on the decision of job mobility varies among individuals. The coefficient is assumed to be normally distributed and unrelated to the random effect for the intercept.

Our mean estimate for this coefficient is ln(3.5) = 1.25, and the variance estimate for this random coefficient is .045. In short, according to the model, the coefficient for college is normally distributed with mean 1.26, and standard deviation sqrt(.045) = .21.

Tell me more

Read more about panel-data survival models in Stata Longitudinal-Data/Panel-Data Reference Manual; see [XT] xtstreg.

You can also read more about multilevel survival models in the Stata Multilevel Mixed-Effects Reference Manual; see [ME] mestreg.

Products

New in Stata 19

Why Stata

All features

Disciplines

Stata/MP

StataNow

Order Stata

Purchase

Order Stata

Bookstore

Stata Press

Stata Journal

Gift Shop

Learn

Free webinars

NetCourses

Classroom and web training

Organizational training

Video tutorials

Third-party courses

Web resources

Teaching with Stata

Support

Training

Video tutorials

FAQs

Statalist: The Stata Forum

Resources

Technical support

Customer service

Alerts

Company

News and events

Customer service

Careers

We use cookies

We use cookies to ensure that we give you the best experience on our website—to enhance site navigation, to analyze usage, and to assist in our marketing efforts. By continuing to use our site, you consent to the storing of cookies on your device and agree to delivery of content, including web fonts and JavaScript, from third party web services.

Cookie Settings

Privacy policy

Last updated: 16 November 2022

StataCorp LLC (StataCorp) strives to provide our users with exceptional products and services. To do so, we must collect personal information from you. This information is necessary to conduct business with our existing and potential customers. We collect and use this information only where we may legally do so. This policy explains what personal information we collect, how we use it, and what rights you have to that information.

Required cookies

Advertising cookies

Required cookies

These cookies are essential for our website to function and do not store any personally identifiable information. These cookies cannot be disabled.
Advertising and performance cookies

This website uses cookies to provide you with a better user experience. A cookie is a small piece of data our website stores on a site visitor's hard drive and accesses each time you visit so we can improve your access to our site, better understand how you use our site, and serve you content that may be of interest to you. For instance, we store a cookie when you log in to our shopping cart so that we can maintain your shopping cart should you not complete checkout. These cookies do not directly store your personal information, but they do support the ability to uniquely identify your internet browser and device.

Please note: Clearing your browser cookies at any time will undo preferences saved here. The option selected here will apply only to the device you are currently using.

Accept Cookies


_t		Haz. ratio Std. err. z P>\|z\| [95% conf. interval]

education		1.008176 .0357434 0.23 0.818 .9404984 1.080723
njobs		.9010326 .0449328 -2.09 0.037 .8171328 .9935468
prestige		.9688059 .0063893 -4.81 0.000 .9563637 .98141
female		2.683054 .4265383 6.21 0.000 1.964761 3.663947
college		3.470632 .3097432 13.94 0.000 2.913677 4.13405
_cons		.0020674 .0010752 -11.89 0.000 .000746 .0057296

/ln_p		.2425694 .0454666 .1534565 .3316824

/sigma2_u		.4865182 .1397864 .2770332 .8544101


_t		Haz. ratio Std. err. z P>\|z\| [95% conf. interval]

education		1.018654 .0400041 0.47 0.638 .9431887 1.100157
njobs		.9059465 .0460997 -1.94 0.052 .8199523 1.000959
prestige		.9682887 .006471 -4.82 0.000 .9556884 .9810551
1.female		2.793345 .4725018 6.07 0.000 2.005126 3.891416
college		3.504567 .3223489 13.63 0.000 2.926451 4.196891
_cons		.0017682 .0010096 -11.10 0.000 .0005774 .0054147

/ln_p		.2493801 .0465699 .1581048 .3406554

id
var(college)		.0448765 .0636064 .0027897 .7219117
var(_cons)		.4024694 .1793156 .1680698 .9637757