help stset, help streset, help st dialog: stset
-------------------------------------------------------------------------------
Title
[ST] stset -- Declare data to be survival-time data
Syntax
Single-record-per-subject survival data
stset timevar [if] [weight] [, single_options]
streset [if] [weight] [, single_options]
st [, nocmd notable]
stset, clear
Multiple-record-per-subject survival data
stset timevar [if] [weight] [, id(idvar) failure(failvar[==numlist])
multiple_options]
streset [if] [weight] [, multiple_options]
streset, {past|future|past future}
st [, nocmd notable]
stset, clear
single_options description
-------------------------------------------------------------------------
Main
failure(failvar[==numlist]) failure event
noshow prevent other st commands
from showing st setting
information
Options
origin(time exp) define when a subject becomes
at risk
scale(#) rescale time value
enter(time exp) specify when subject first
enters study
exit(time exp) specify when subject exits
study
Advanced
if(exp) select records for which exp
is true; recommended rather
than if exp
time0(varname) mechanical aspect of
interpretation about
records in dataset; seldom
used
-------------------------------------------------------------------------
multiple_options description
-------------------------------------------------------------------------
Main
* {opth id(varname:idvar)} multiple-record ID variable
* failure(failvar[==numlist]) failure event
noshow prevent other st commands
from showing st setting
information
Options
origin([varname==numlist time exp|min) define when a subject becomes
at risk
scale(#) rescale time value
enter([varname==numlist] time exp) specify when subject first
enters study
exit(failure|[varname==numlist] time exp) specify when subject exits
study
Advanced
if(exp) select records for which exp
is true; recommended rather
than if exp
ever(exp) select objects for which exp
is ever true
never(exp) select objects for which exp
is never true
after(exp) select records within subject
on or after the first time
exp is true
before(exp) select records within subject
before the first time exp
is true
time0(varname) mechanical aspect of
interpretation about
records in dataset; seldom
used
-------------------------------------------------------------------------
* id() and failure() are required with stset multiple-record-per-subject
survival data.
fweights, iweights, and pweights are allowed; see weight.
Examples
Time measured from 0, all failed
. stset ftime
Time measured from 0, censoring
. stset ftime, failure(died)
Time measured from 0, censoring & ID
. stset ftime, failure(died) id(id)
Time measured from 0, failure codes
. stset ftime, failure(died==2,3)
Time measured from dob, censoring
. stset ftime, failure(died) origin(time dob)
You cannot harm your data by using stset, so feel free to experiment.
Menu
Statistics > Survival analysis > Setup and utilities > Declare data to be
survival-time data
Description
st refers to survival-time data, which are fully described below.
stset declares the data in memory to be st data, informing Stata of key
variables and their roles in a survival-time analysis. When you stset
your data, stset runs various data consistency checks to ensure that what
you have declared makes sense. If the data are weighted, you specify the
weights when you stset the data, not when you issue the individual st
commands.
streset changes how the st dataset is declared. In multiple-record data,
streset can also temporarily set the sample to include records from
before the time at risk (called the past) and records after failure
(called the future). Then typing streset without arguments resets the
sample back to the analysis sample.
st displays how the dataset is currently declared.
Whenever you type stset or streset, Stata runs or reruns data consistency
checks to ensure that what you are now declaring (or declared in the
past) makes sense. Thus if you have made any changes to your data or
simply wish to verify how things are, you can type streset with no
options.
stset, clear is for use by programmers. It causes Stata to forget the st
markers, making the data no longer st data to Stata. The data remain
unchanged. It is not necessary to stset, clear before doing another
stset.
Options for use with stset and streset
+------+
----+ Main +-------------------------------------------------------------
id(idvar) specifies the subject-ID variable; observations with equal,
nonmissing values of idvar are assumed to be the same subject. idvar
may be string or numeric. Observations for which idvar is missing (.
or "") are ignored.
When id() is not specified, each observation is assumed to represent
a different subject and thus constitutes a one-record-per-subject
survival dataset.
When you specify id(), the data are said to be multiple-record data,
even if it turns out that there is only one record per subject.
Perhaps they would better be called potentially multiple-record data.
If you specify id(), stset requires that you specify failure().
Specifying id() never hurts; we recommend it because a few st
commands, such as stsplit, require an ID variable to have been
specified when the dataset was stset.
failure(failvar[==numlist]) specifies the failure event.
If failure() is not specified, all records are assumed to end in
failure. This is allowed with single-record data only.
If failure(failvar) is specified, failvar is interpreted as an
indicator variable; 0 and missing mean censored, and all other values
are interpreted as representing failure.
If failure(failvar==numlist) is specified, records with failvar
taking on any of the values in numlist are assumed to end in failure,
and all other records are assumed to be censored.
noshow prevents other st commands from showing the key st variables at
the top of their output.
+---------+
----+ Options +----------------------------------------------------------
origin([varname==numlist] time exp | min) and scale(#) define analysis
time; i.e., origin() defines when a subject becomes at risk.
Subjects become at risk when time = origin(). All analyses are
performed in terms of time since becoming at risk, called analysis
time.
Let us use the terms time for how time is recorded in the data and t
for analysis time. Analysis time t is defined
time - origin()
t = ---------------
scale()
t is time from origin in units of scale.
By default, origin(time 0) and scale(1) are assumed, meaning that t =
time. Then you must ensure that time in your data is measured as
time since becoming at risk. Subjects are exposed at t = time = 0
and later fail. Observations with t = time < 0 are ignored because
information before becoming at risk is irrelevant.
origin() determines when the clock starts ticking. scale() plays no
substantive role, but it can be handy for making t units more
readable (such as converting days to years).
origin(time exp) sets the origin to exp. For instance, if time were
recorded as dates, such as 05jun1998, in your data and variable
expdate recorded the date when subjects were exposed, you could
specify origin(time expdate). If instead all subjects were exposed
on 12nov1997, you could specify origin(time mdy(11,12,1997)).
origin(time exp) may be used with single- or multiple-record data.
origin(varname==numlist) is for use with multiple-record data; it
specifies the origin indirectly. If time were recorded as dates in
your data, variable obsdate recorded the (ending) date associated
with each record, and subjects became at risk upon, say, having a
certain operation -- and that operation were indicated by code==217
-- then you could specify origin(code==217). origin(code==217) would
mean, for each subject, that the origin time is the earliest time at
which code==217 is observed. Records before that would be ignored
(because t < 0). Subjects who never had code==217 would be ignored
entirely.
origin(varname==numlist time exp) sets the origin to the later of the
two times determined by varname==numlist and exp.
origin(min) sets origin to the earliest time observed, minus 1. This
is an odd thing to do and is described in example 10.
origin() is an important concept; see Key concepts, Two concepts of
time, and The substantive meaning of analysis time in [ST] stset.
scale() makes results more readable. If you have time recorded in
days (such as Stata dates, which are really days since 01jan1960),
specifying scale(365.25) will cause results to be reported in years.
enter([varname==numlist] time exp) specifies when a subject first comes
under observation, meaning that any failures, were they to occur,
would be recorded in the data.
Do not confuse enter() and origin(). origin() specifies when a
subject first becomes at risk. In many datasets, becoming at risk
and coming under observation are coincident. Then it is sufficient
to specify origin().
enter(time exp), enter(varname==numlist), and enter(varname==numlist
time exp) follow the same syntax as origin(). In multiple-record
data, both varname==numlist and time exp are interpreted as the
earliest time implied, and if both are specified, the later of the
two times is used.
exit(failure | [varname==numlist] time exp) specifies the latest time
under which the subject is both under observation and at risk. The
emphasis is on latest; obviously, subjects also exit the risk pool
when their data run out.
exit(failure) is the default. When the first failure event occurs,
the subject is removed from the analysis risk pool, even if the
subject has subsequent records in the data and even if some of those
subsequent records document other failure events. Specify exit(time
.) if you wish to keep all records for a subject after failure. You
want to do this if you have multiple-failure data.
exit(varname==numlist), exit(time exp), and exit(varname==numlist
time exp) follow the same syntax as origin() and enter(). In
multiple-record data, both varname==numlist and time exp are
interpreted as the earliest time implied. exit differs from origin()
and enter() in that if both are specified, the earlier of the two
times is used.
+----------+
----+ Advanced +---------------------------------------------------------
if(exp), ever(exp), never(exp), after(exp), and before(exp) select
relevant records.
if(exp) selects records for which exp is true. We strongly recommend
specifying this if() option rather than if exp following stset or
streset. They differ in that if exp removes the data from
consideration before calculating beginning and ending times and other
quantities. The if option, on the other hand, sets the restriction
after all derived variables are calculated.
if() may be specified with single- or multiple-record data. The
remaining selection options are for use with multiple-record data
only.
ever(exp) selects only subjects for which exp is ever true.
never(exp) selects only subjects for which exp is never true.
after(exp) selects records within subject on or after the first time
exp is true.
before(exp) selects records within subject before the first time exp
is true.
time0(varname) is seldom specified because most datasets do not contain
this information. time0() should be used exclusively with
multiple-record data, and even then you should consider whether
origin() or enter() would be more appropriate.
time0() specifies a mechanical aspect of interpretation about the
records in the dataset, namely, the beginning of the period spanned
by each record. This option should be restricted to multiple records
per subject data to indicate gaps.
Options unique to streset
past expands the stset sample to include the entire recorded past of the
relevant subjects, meaning that it includes observations before
becoming at risk and those excluded because of after(), etc.
future expands the stset sample to include the records on the relevant
subjects after the last record that previously was included, if any,
which typically means to include all observations after failure or
censoring.
past future expands the stset sample to include all records on the
relevant subjects.
Typing streset without arguments resets the sample to the analysis
sample.
Options for st
nocmd suppresses displaying the last stset command.
notable suppresses displaying the table summarizing what has been stset.
Example: Single-record-per-subject
The failure-time (analysis-time) variable is failtime
. webuse kva
. stset failtime
Example: Single-record-per-subject with censoring
The analysis-time variable is studytime, the failure/censoring indicator
variable is died with died==1 denoting a failure event
. webuse drugtr
. stset studytime, failure(died)
The analysis-time variable is dox, the failure event is any outcome
category different from 0 (fail!=0). Subjects first become at risk and
come under observation at time 0 (the default)
. webuse diet
. stset dox, failure(fail)
Subjects first become at risk at time 0 and come under observation at
date of entry into the study recorded in variable doe
. stset dox, failure(fail) enter(time doe)
Subjects first become at risk and come under observation at date of birth
recorded in variable dob rather than at time 0
. stset dox, failure(fail) origin(time dob)
Subjects first become at risk at date of birth and come under observation
at date of entry into the study
. stset dox, failure(fail) origin(time dob) enter(time doe)
Set the scale for time-since-birth (analysis time) to be measured in
years
. stset dox, failure(fail) origin(time dob) enter(time doe)
scale(365.25)
Specify that subjects with dates of exposure dox after 01dec1970 be
removed from the analysis risk pool
. stset dox, failure(fail) origin(time dob) enter(time doe)
scale(365.25) exit(time mdy(12,1,1970))
Example: Multiple-record-per-subject
The analysis-time variable is dox, the subject-identifier variable is id.
Indicator variable allfail is equal to 1 in the last record of each
subject, indicating that all subjects fail at the end of the study
. webuse diet2
. stset dox, id(id) failure(allfail)
Example: Multiple-record-per-subject with censoring
The analysis-time variable is t1, the failure/censoring indicator
variable is died with died==1 denoting a failure event
. webuse stan3
. stset t1, id(id) failure(died)
The analysis-time variable is dox, the failure event is any outcome
category different from 0 (fail!=0), and the subject-identifier variable
is id. Subjects first become at risk and come under observation at time
0 (the default)
. webuse diet2
. stset dox, id(id) failure(fail)
Subjects first become at risk at time 0 and come under observation at
date of entry into the study recorded in variable doe
. stset dox, id(id) failure(fail) enter(time doe)
Subjects first become at risk and come under observation at date of birth
recorded in variable dob
. stset dox, id(id) failure(fail) origin(time dob)
Subjects first become at risk at date of birth and come under observation
at date of entry into the study
. stset dox, id(id) failure(fail) origin(time dob) enter(time doe)
Set the scale for time-since-birth (analysis time) to be measured in
years
. stset dox, id(id) failure(fail) origin(time dob) enter(time doe)
scale(365.25)
Specify the outcome categories 1, 3, and 13 of fail to denote a failure
event and the outcome category 5 to indicate that a subject must be
removed from the analysis risk pool
. stset dox, id(id) failure(fail==1 3 13) origin(time dob) enter(time
doe) scale(365.25) exit(fail==1 3 13 5)
Also see
Manual: [ST] stset
Help: [ST] snapspan, [ST] stdescribe