.- help for ^stegen^, ^strepl^, ^stsplit^, and ^sttvc^ (STB-41: ssa11) .- Creation of time-varying covariates for survival-time data ---------------------------------------------------------- ^stegen^ [type] newvar [^if^ exp] [^in^ range] ^, at(^exp-t^)^ [ ^from(^exp-0^)^ ^to(^exp-1^)^ ^c^ensor ^nosh^ow ^p^reserve ] ^strepl^ oldvar [^if^ exp] [^in^ range]^,^ ^at(^exp-t^)^ { ^from(^exp-0^)^ | ^to(^exp-1^)^ } [ ^c^ensor ^nosh^ow ^p^reserve ] ^stsplit^ [type] newvar [^if^ exp] [^in^ range]^,^ { ^at(^numlist | exp-list^)^ | ^every(^# | exp^)^ } [ ^expr^ ^nosh^ow ^p^reserve ] ^sttvc^ [type] newvar ^=^ v(0) t(1) v(1) ... t(k) v(k) [^if^ exp] [^in^ range] [^, zero(^exp-t^)^ ^nosh^ow ^p^reserve ] These commands are for use with survival-time data; see help @st@. You must have ^stset^ your data before using this command; see help @stset@. See ^help numlist^ for the definition of a numlist object. Description ----------- The commands facilitate the creation of time-varying covariates, represented in multi-episode, by hiding the tricky data manipulation. ^stegen^ generates a new time-varying covariates that changes ^at^ time exp-t ^from^ value exp-0 ^to^ value exp-1. ^stegen^ performs episode-spitting if necessary. ^strepl^ modifies an existing time-varying covariate so that is takes value exp-0 for t>exp-t or exp-1 for t<=exp-t. ^strepl^ performs episode-spitting if necessary. ^stsplit^ splits the time axis at time points n1,n2,.,.nk specified by a numlist into k+1 intervals [0,n1], (n1,n2],..., creating an indicator variable that has value i on the i-th interval. ^stsplit^ can be used to create a "person period file". ^sttvc^ generates a time-varying covariate that is v(0) for t<=t(1), is v(1) for t(1)t(k). The list v(0) t(1) .. t(k) v(k) consists of an odd number of expressions, separated by blanks. It is enforced that the transition times t(j) are strictly increasing, or that t(j)==. implies that t(j')==. for j'>j. These commands can of course be invoked more than once in a job, in any combination, to create and manipulate one or more time-varying covariates. Episode splitting will always be performed transparently. If the command ^parsoptp^ is located on the ado-path, the options ^at()^, ^from()^, ^to()^, ^every()^, and ^zero()^ may contain expressions with matching parentheses. If necessary, these commands generate a case identification variable, entry times, and a failure/censor indicator. See help for @stset@ for details on these variables; and help @st_aux@ for details on the names tried. See ^stcoxtvc^ for the creation of continuously-varying covariates in a Cox regression model. Options ^stegen^ and ^strepl^ ------------------------- ^at(^exp^)^ is not optional; it specifies the time, expressed as the elapsed time since time 0, at which the TVC changes from exp_0 to exp_1. The time should be non-missing. A warning message is displayed if ^at^ is not positive. ^censor^ specifies that missing values of the at-expression indicate censoring, i.e., the event did not happen during (t0,t1), and "thus" the TVC should be set to the from-value (^stegen^) or left unchanged (^strepl^). ^from(^exp-0^)^ specifies the value of the TVC before the change (transition). In ^stegen^, ^from^ defaults to 0. In ^strepl^ one should specify either ^from()^ or ^to()^. ^to(^exp-1^)^ specifies the value of the TVC after the change (transition). In ^stegen^, ^to^ defaults to 1. In ^strepl^ one should specify either ^from()^ or ^to()^. Options ^stsplit^ --------------- ^at(^numlist|exp-list^)^ is not optional; it specifies the time points, expressed as the elapsed time since time 0, at which the episodes have to be splitted. Example: ^at(5 20)^ performs episode splitting at times 5 and 20. ^every(^#|exp^)^ specifies that episodes are generated at multiples of a (positive) constant # or of an expression that may vary between cases. ^expr^ specifies that ^at()^ should be interpreted as an expression-list rather than as a numeric list. Options for sttvc ----------------- ^zero(^exp^)^ specifies the a zero-point for the transition times t(j). The expression should evaluate to non-missing. Options (general) ----------------- ^preserve^ specifies that the data are preserved before data manipulation, so that the data will be restored in original form after pressing Break. ^show^ and ^noshow^ change whether the other st commands are to display the identities of the key st variables at the top of their output. Remarks ------- What does it mean that some transition occurs "at" some time t? The programs ^stegen^, ^strepl^, ^stsplit^, and ^sttvc^ split at exactly the time t, and thereby follow the Stata convention that transitions become effective right after t, and not already at time t. This detailed meaning of "at" is not important for the continuous time parametric survival time models (^stereg^, ^stweib^), but the interpretation is important for the discrete time models and for Cox regression (^stcox^). If you want the new status to be effective already at time t, you can shift the transition time somewhat earlier, by specifying at(t-eps) rather than at(t), for some eps that is smaller that the measurement unit of time in the data. Examples -------- /1/ We want to analyze the effect of marital status on the timing of the birth of the first child. The data are time of dating (tdate), time of marriage (tmarried), and time of birth of child (tbirth), and the time of interview (tstudy) all expressed as calendar-time in months since 1900. Time variables for respondents who have not yet experienced the respective events are set to missing. Then the following code fragment will produce the required results. (1) . ^gen HasChild = tbirth!=.^ (2) . ^gen t = cond(HasChild, tbirth-tdate, tstudy-tdate)^ (3) . ^stset t HasChild^ (4) . ^stegen mstatus, at(tmarried-tdate) censor^ (5) . ^stcox mstatus education race ...^ The lines (1) and (2) generate the survival time variables, with time expressed as time-at-risk (time since dating). Line (3) interfaces to the ^st^-package. Line (4) defines the time-varying covariate marital status that is 0 if respondent is not married, and 1 if married. ^stegen^ will silently have expanded the data set, created a t0 variable, etc. We used the option ^censor^ since missing values of ^tmarried^ and hence of tmarried-tdate mean that the event has not yet happened and hence the tvc should take the from-value that defaults to 0. Note that the time-varying covariate could also have been obtained by (4a) .^ stegen mstatus, at(tmarried-tdate) from(0) to(1) censor^ or as (4.b1) .^ gen mstatus = 0^ (4.b2) .^ strepl mstatus, at(tmarried-tdate) to(1) censor^ or even without invoking the ^censor^ option as (4.c) .^ stegen mstatus, at(cond(tmarried~=.,tmarried-tdate,tstudy-tdate+1))^ ^.. from(0) to(1)^ Note that in (4.c) we specified an expression in the at() option that contain parentheses. This is normally not allowed in Stata. See @parsoptp@ how the problem was circumvented. /2/ ^stegen^ can be used to easily perform a (partial) likelihood ratio test for the proportional hazard assumption. (See ^stphtest^ for a canned solution.) Consider 3 time intervals [0,A], (A,B], and [B,.). We test whether the beta coefficients are the same on the three intervals. We use the Kapplan-Meier estimator to estimate A and B so that approximately S(A)=2/3 and S(B) = 1/3. In the example below, we use A=50 and B=100. Then (1) . ^stset wtime died, id(id) t0(t0)^ (declare the data) (2) . ^stcox x1 x2 x3^ (fit cox model) (3) . ^lrtest, saving(M0)^ (save estimation information) (4) . ^stsplit D, at(50 100)^ (splits time at 50 100) (5) . ^xi: stcox i.D*x1 i.D*x2 i.D*x3^ (cox with interval-specific coef.) (6) . ^lrtest, saving(M1)^ (save estimation information) (7) . ^lrtest, using(M1) model(M0)^ (test statistic, and p value) Note that the ^lrtest^ displays a warning message that the number of observations in M0 and M1 are not the same. This message is due to ^lrtest^'s failure to deal properly with multi-record (linked) observations, and it can be ignored here. Note that the ^stsplit^ command in (4) would actually be equivalent to (4.1) . ^stegen D, at(50) from(0) to(1)^ (4.2) . ^strepl D, at(100) to(2)^ /3/ You have data on the birth dates tbirth1..tbirth3, of up to 3 children of women, with dates of "missing children" denotes by missing values. You can create a age-dependent time varying covariate number-of-children, i.e., relative to the birth date wbirth of the women as . ^sttvc nchild = 0 tbirth1-wbirth 1 tbirth2-wbith 2 tbirth3-wbirth 3^ or more concisely as . ^sttvc nchild = 0 tbirth1 1 tbirth2 2 tbirth3 3, zero(wbirth)^ Acknowledgments --------------- This project was supported by grant PGS 50-370 of the Netherlands Organization for Scientific Research. Helpful suggestions by Wim Bernasco are gratefully acknowledged. Author ------ Jeroen Weesie Utrecht University Netherlands weesie@@weesie.fsw.ruu.nl Also See -------- STB: STB-41 ssa11 Manual: [R] st On-line: help for @st@; @stset@; @stcoxtvc@; @stphtest@; @numlist@; @parsoptp@.