Stata 15 help for stset

[ST] stset -- Declare data to be survival-time data

Syntax

Single-record-per-subject survival data

stset timevar [if] [weight] [, single_options]

streset [if] [weight] [, single_options]

st [, nocmd notable]

stset, clear

Multiple-record-per-subject survival data

stset timevar [if] [weight] , id(idvar) failure(failvar[==numlist]) [multiple_options]

streset [if] [weight] [, multiple_options]

streset, {past|future|past future}

st [, nocmd notable]

stset, clear

single_options Description ------------------------------------------------------------------------- Main failure(failvar[==numlist]) failure event noshow prevent other st commands from showing st setting information

Options origin(time exp) define when a subject becomes at risk scale(#) rescale time value enter(time exp) specify when subject first enters study exit(time exp) specify when subject exits study

Advanced if(exp) select records for which exp is true; recommended rather than if exp time0(varname) mechanical aspect of interpretation about records in dataset; seldom used -------------------------------------------------------------------------

multiple_options Description ------------------------------------------------------------------------- Main * id(idvar) multiple-record ID variable * failure(failvar[== numlist]) failure event noshow prevent other st commands from showing st setting information

Options origin([varname == numlist] time exp|min) define when a subject becomes at risk scale(#) rescale time value enter([varname == numlist] time exp) specify when subject first enters study exit(failure|[varname == numlist] time exp) specify when subject exits study

Advanced if(exp) select records for which exp is true; recommended rather than if exp ever(exp) select subjects for which exp is ever true never(exp) select subjects for which exp is never true after(exp) select records within subject on or after the first time exp is true before(exp) select records within subject before the first time exp is true time0(varname) mechanical aspect of interpretation about records in dataset; seldom used ------------------------------------------------------------------------- * id() and failure() are required with stset multiple-record-per-subject survival data.

fweights, iweights, and pweights are allowed; see weight.

Examples

Time measured from 0, all failed . stset ftime

Time measured from 0, censoring . stset ftime, failure(died)

Time measured from 0, censoring & ID . stset ftime, failure(died) id(id) Time measured from 0, failure codes . stset ftime, failure(died==2,3) Time measured from dob, censoring . stset ftime, failure(died) origin(time dob)

You cannot harm your data by using stset, so feel free to experiment.

Menu

Statistics > Survival analysis > Setup and utilities > Declare data to be survival-time data

Description

st refers to survival-time data, which are fully described below.

stset declares the data in memory to be st data, informing Stata of key variables and their roles in a survival-time analysis. When you stset your data, stset runs various data consistency checks to ensure that what you have declared makes sense. If the data are weighted, you specify the weights when you stset the data, not when you issue the individual st commands.

streset changes how the st dataset is declared. In multiple-record data, streset can also temporarily set the sample to include records from before the time at risk (called the past) and records after failure (called the future). Then typing streset without arguments resets the sample back to the analysis sample.

st displays how the dataset is currently declared.

Whenever you type stset or streset, Stata runs or reruns data consistency checks to ensure that what you are now declaring (or declared in the past) makes sense. Thus if you have made any changes to your data or simply wish to verify how things are, you can type streset with no options.

stset, clear is for use by programmers. It causes Stata to forget the st markers, making the data no longer st data to Stata. The data remain unchanged. It is not necessary to stset, clear before doing another stset.

Options for use with stset and streset

+------+ ----+ Main +-------------------------------------------------------------

id(idvar) specifies the subject-ID variable; observations with equal, nonmissing values of idvar are assumed to be the same subject. idvar may be string or numeric. Observations for which idvar is missing (. or "") are ignored.

When id() is not specified, each observation is assumed to represent a different subject and thus constitutes a one-record-per-subject survival dataset.

When you specify id(), the data are said to be multiple-record data, even if it turns out that there is only one record per subject. Perhaps they would better be called potentially multiple-record data.

If you specify id(), stset requires that you specify failure().

Specifying id() never hurts; we recommend it because a few st commands, such as stsplit, require an ID variable to have been specified when the dataset was stset.

failure(failvar[== numlist]) specifies the failure event.

If failure() is not specified, all records are assumed to end in failure. This is allowed with single-record data only.

If failure(failvar) is specified, failvar is interpreted as an indicator variable; 0 and missing mean censored, and all other values are interpreted as representing failure.

If failure(failvar == numlist) is specified, records with failvar taking on any of the values in numlist are assumed to end in failure, and all other records are assumed to be censored.

noshow prevents other st commands from showing the key st variables at the top of their output.

+---------+ ----+ Options +----------------------------------------------------------

origin([varname == numlist] time exp | min) and scale(#) define analysis time; that is, origin() defines when a subject becomes at risk. Subjects become at risk when time = origin(). All analyses are performed in terms of time since becoming at risk, called analysis time.

Let us use the terms time for how time is recorded in the data and t for analysis time. Analysis time t is defined

time - origin() t = --------------- scale()

t is time from origin in units of scale.

By default, origin(time 0) and scale(1) are assumed, meaning that t = time. Then you must ensure that time in your data is measured as time since becoming at risk. Subjects are exposed at t = time = 0 and later fail. Observations with t = time <= 0 are ignored because information before becoming at risk is irrelevant.

origin() determines when the clock starts ticking. scale() plays no substantive role, but it can be handy for making t units more readable (such as converting days to years).

origin(time exp) sets the origin to exp. For instance, if time were recorded as dates, such as 05jun1998, in your data and variable expdate recorded the date when subjects were exposed, you could specify origin(time expdate). If instead all subjects were exposed on 12nov1997, you could specify origin(time mdy(11,12,1997)).

origin(time exp) may be used with single- or multiple-record data.

origin(varname == numlist) is for use with multiple-record data; it specifies the origin indirectly. If time were recorded as dates in your data, variable obsdate recorded the (ending) date associated with each record, and subjects became at risk upon, say, having a certain operation -- and that operation were indicated by code==217 -- then you could specify origin(code==217). origin(code==217) would mean, for each subject, that the origin time is the earliest time at which code==217 is observed. Records before that would be ignored (because t < 0). Subjects who never had code==217 would be ignored entirely.

origin(varname == numlist time exp) sets the origin to the later of the two times determined by varname==numlist and exp.

origin(min) sets origin to the earliest time observed, minus 1. This is an odd thing to do and is described in example 10 in [ST] stset.

origin() is an important concept; see Key concepts, Two concepts of time, and The substantive meaning of analysis time in [ST] stset.

scale() makes results more readable. If you have time recorded in days (such as Stata dates, which are really days since 01jan1960), specifying scale(365.25) will cause results to be reported in years.

enter([varname == numlist] time exp) specifies when a subject first comes under observation, meaning that any failures, were they to occur, would be recorded in the data.

Do not confuse enter() and origin(). origin() specifies when a subject first becomes at risk. In many datasets, becoming at risk and coming under observation are coincident. Then it is sufficient to specify origin().

enter(time exp), enter(varname == numlist), and enter(varname == numlist time exp) follow the same syntax as origin(). In multiple-record data, both varname == numlist and time exp are interpreted as the earliest time implied, and if both are specified, the later of the two times is used.

exit(failure | [varname == numlist] time exp) specifies the latest time under which the subject is both under observation and at risk. The emphasis is on latest; obviously, subjects also exit the risk pool when their data run out.

exit(failure) is the default. When the first failure event occurs, the subject is removed from the analysis risk pool, even if the subject has subsequent records in the data and even if some of those subsequent records document other failure events. Specify exit(time .) if you wish to keep all records for a subject after failure. You want to do this if you have multiple-failure data.

exit(varname == numlist), exit(time exp), and exit(varname == numlist time exp) follow the same syntax as origin() and enter(). In multiple-record data, both varname == numlist and time exp are interpreted as the earliest time implied. exit differs from origin() and enter() in that if both are specified, the earlier of the two times is used.

+----------+ ----+ Advanced +---------------------------------------------------------

if(exp), ever(exp), never(exp), after(exp), and before(exp) select relevant records.

if(exp) selects records for which exp is true. We strongly recommend specifying this if() option rather than if exp following stset or streset. They differ in that if exp removes the data from consideration before calculating beginning and ending times and other quantities. The if() option, on the other hand, sets the restriction after all derived variables are calculated. See if() versus if exp in [ST] stset.

if() may be specified with single- or multiple-record data. The remaining selection options are for use with multiple-record data only.

ever(exp) selects only subjects for which exp is ever true.

never(exp) selects only subjects for which exp is never true.

after(exp) selects records within subject on or after the first time exp is true.

before(exp) selects records within subject before the first time exp is true.

time0(varname) is seldom specified because most datasets do not contain this information. time0() should be used exclusively with multiple-record data, and even then you should consider whether origin() or enter() would be more appropriate.

time0() specifies a mechanical aspect of interpretation about the records in the dataset, namely, the beginning of the period spanned by each record. See Intermediate exit and reentry times (gaps) in [ST] stset.

Options unique to streset

past expands the stset sample to include the entire recorded past of the relevant subjects, meaning that it includes observations before becoming at risk and those excluded because of after(), etc.

future expands the stset sample to include the records on the relevant subjects after the last record that previously was included, if any, which typically means to include all observations after failure or censoring.

past future expands the stset sample to include all records on the relevant subjects.

Typing streset without arguments resets the sample to the analysis sample. See Past and future records in [ST] stset for more information.

Options for use with st

nocmd suppresses displaying the last stset command.

notable suppresses displaying the table summarizing what has been stset.

Example: Single-record-per-subject

The failure-time (analysis-time) variable is failtime . webuse kva . stset failtime

Example: Single-record-per-subject with censoring

The analysis-time variable is studytime, the failure/censoring indicator variable is died with died==1 denoting a failure event . webuse drugtr . stset studytime, failure(died)

The analysis-time variable is dox, the failure event is any outcome category different from 0 (fail!=0). Subjects first become at risk and come under observation at time 0 (the default) . webuse diet . stset dox, failure(fail)

Subjects first become at risk at time 0 and come under observation at date of entry into the study recorded in variable doe . stset dox, failure(fail) enter(time doe)

Subjects first become at risk and come under observation at date of birth recorded in variable dob rather than at time 0 . stset dox, failure(fail) origin(time dob)

Subjects first become at risk at date of birth and come under observation at date of entry into the study . stset dox, failure(fail) origin(time dob) enter(time doe)

Set the scale for time-since-birth (analysis time) to be measured in years . stset dox, failure(fail) origin(time dob) enter(time doe) scale(365.25)

Specify that subjects with dates of exposure dox after 01dec1970 be removed from the analysis risk pool . stset dox, failure(fail) origin(time dob) enter(time doe) scale(365.25) exit(time mdy(12,1,1970))

Example: Multiple-record-per-subject

The analysis-time variable is dox, the subject-identifier variable is id. Indicator variable allfail is equal to 1 in the last record of each subject, indicating that all subjects fail at the end of the study . webuse diet2 . stset dox, id(id) failure(allfail)

Example: Multiple-record-per-subject with censoring

The analysis-time variable is t1, the failure/censoring indicator variable is died with died==1 denoting a failure event . webuse stan3 . stset t1, id(id) failure(died)

The analysis-time variable is dox, the failure event is any outcome category different from 0 (fail!=0), and the subject-identifier variable is id. Subjects first become at risk and come under observation at time 0 (the default) . webuse diet2 . stset dox, id(id) failure(fail)

Subjects first become at risk at time 0 and come under observation at date of entry into the study recorded in variable doe . stset dox, id(id) failure(fail) enter(time doe)

Subjects first become at risk and come under observation at date of birth recorded in variable dob . stset dox, id(id) failure(fail) origin(time dob)

Subjects first become at risk at date of birth and come under observation at date of entry into the study . stset dox, id(id) failure(fail) origin(time dob) enter(time doe)

Set the scale for time-since-birth (analysis time) to be measured in years . stset dox, id(id) failure(fail) origin(time dob) enter(time doe) scale(365.25)

Specify the outcome categories 1, 3, and 13 of fail to denote a failure event and the outcome category 5 to indicate that a subject must be removed from the analysis risk pool . stset dox, id(id) failure(fail==1 3 13) origin(time dob) enter(time doe) scale(365.25) exit(fail==1 3 13 5)

Video example

Learn how to set up your data for survival analysis


© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index