Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: data prep for survival analysis with time varying covariate

From   Steve Samuels <>
Subject   Re: st: data prep for survival analysis with time varying covariate
Date   Thu, 26 Jan 2012 09:43:40 -0500

The -stset- statement omitted the comma. It should be:

stset study_time, fail(event=1) 


Kyleigh Schraeder sent me the following privately and I pointed her to the FAQ about private communications.

That said, I think that her problem is much simpler than she originally stated.

The crucial point:  Kyleigh wants to study the effect of waiting time on the probability of her event (contacting agency 2).  But waiting time is not a time-dependent covariate in her data, it is the fundamental time dimension for her survival analysis.

There are three ways that people might end observation for this analysis.
1  Contact Agency 2  (Event of Interest)
2  Receive services from Agency 1, so no need to contact Agency 2)
3  Be on waiting list for Agency 1  at the time of data collection
(Possible, but not stated by her is: 4 drop off waiting list 1 for reason other than receiving services)

She has coded the censored observations as zero

She originally presented the following data structure,
ID      Observation Time     Event   Waittime
1       50                    0       40
2       73                    0       8
3       150                   1       100

She tells us that she has a variable for each day on the waiting list, called Wait1 Wait2....
This was completely unnecessary, as she would have known if she'd looked at the manual entry for -stset-.  For her analysis she needs only a single line for each person and two variables:

1. study_time:  the maximum of her waiting times until the end of observation (this might be either her "Observation Time" or "waittime" .  I"ll take is as Observation Time in her example.)
2. event indicator: 0 or 1 as above, but better would be 1,2,3.

Then the data will look like:

id  study_time event
1    50         0
2    73         0
3   150         1

stset study_time fail(event=1)   //study time to contact agency 2

The Kaplan-Meir failure curve will  now estimate the proportion of people who had contacted agency 2 by the time they had been on agency 1's waiting list for t days.  With -stcox- or other regression models, she can assess factors that make it more or less likely that people on the waiting list for agency 1 will contact agency 2

If Kyleigh is  not sure how to go from her current data structure to the one above. I suggest that she write to the list again. I suspect that all is needed is some variant "egen study_time = max(wait1 wait2....)"

I again suggested to Kyleigh that, if she is at a university, she consult a biostatistician there.


Hi Steve,

Thank you very much for your helpful reply.  Perhaps I need to 'flesh out' my research question to give you a better idea of my 'problem'.

In my study, families were asked some questions about their experiences in the children's mental health system over the past year.  All families were placed on a waiting list when they first contacted a service agency ('Agency 1').  My research question is: how does the duration of waiting time influence the probability that the family will contact another service agency (i.e., seeking help elsewhere, contacting an 'Agency 2').

My variables include: did family contact an Agency 2 (event), the time between when the family contacted Agency 1 and Agency 2 (time to event), and how long families have been waiting at Agency 1 in days (time varying covariate).  Families were asked how long they had been waiting at Agency 1.  If a family had recieved some help at Agency 1, they were asked how long they had to wait before they received help?

With regards to the survival analysis, the only event I am interested in is if/when the family contacted another agency (Agency 2).
This may be a bit confusing since a family could either still be waiting at Agency 1 when the event occurs OR a family may have received some form of 'help' in between contacting Agency 1 and 2 before the event occured.  Thus, some families are technically 'not waiting' for a period of time before the event occurs.

So, I am interested in all participants regardless of whether or not they came off of the waiting list before the event occured.  I am just unsure of how to go about coding the time varying covariate.
Right now, I have a specific number of columns (day1, day2, day3..) for each participant depending on how much time passed between contacting agency 1 and agency 2.  So someone that has been waiting for 40 days has 40 columns.  I also have a separate 40 columns to code their waiting status.  If the family is waiting, they are coded with '1' vs. '0' (not waiting).

I have taken a look at the Introduction to Survival Analysis using Stata book and I have found it very helpful.  In fact, there is a similar problem in the book for explaining tvc about employment for the unemployed (the time varying covariate is receiving unemployment benefits (variable changes from 0 to 1 when the benefits run out).  The question is whether receiving unemployment benefits would affect chances of acception employment).  However, it doesn't provide information on how to set up this problem in Stata.

I hope this has clarified my problem (I apologize if this was more confusing!),  I welcome any further guidance/advice you might have for me.

Kindest regards,

On Sun, Jan 22, 2012 at 4:04 PM, Steve Samuels <> wrote:

Dear Kyleigh,

I find your description very confusing. You refer to an "event" and also to people who come off a waiting list to "receive services", which is something apparently different.

Here's my attempt to decipher your problem

1. You are interested in events only after people come off the waiting list.
2. But you want the time dimension to be time after entry into the study, which is the time they go onto the waiting list.
3. Your question is then: for people who are off the waiting list and at time "t" have not yet had the event, will the amount of time spent on the waiting list influence the probability of having the event at that time.

If I am correct, then events that occur while people are on the waiting list are not "censored", but they are to be ignored. Also, after people leave a waiting list, the time they were on it remains constant, so "waittime" is not a time-dependent variable. With this setup, you cannot estimate survival probabilities per se, but you can estimate the relative hazards of having events.

If my guess about your question is right, then your current data setup is OK, and here is some sample code. I'm not sure why you think a "wide format" is needed, or even where you got the idea. There is nothing like it in the Stata Survival Manual entry for -stset-.

stset timetoevent, id(id) failure(event==1) enter(time waittime)
stcox waittime

But: I am not confident that this code responds to your "real problem". It is very dangerous to attempt an analysis when you don't understand basic issues. And when your question is confused, you are apt to get wrong advice. I'm going to quote Mike Hanson's instructions to his advanced econometrics class, given in a Statalist post of May 8, 2009: "Never push a button or type a command you do not fully understand.

If you are a student at a university with a statistics program, I strongly suggest that you consult a faculty member who is expert in survival analysis. Also read a good text on the subject, such as An Introduction to Survival Analysis with Stata. The Stata manual contains many examples that might be helpful. .

Finally, the proper spelling of the program we all use is "Stata", not "STATA". For the reason, see the last entry in the Statalist FAQ.


On Jan 18, 2012, at 6:19 PM, Kyleigh Schraeder wrote:

Hi Dr. Gagnon,

This is the first time I'm using a survival analysis so I hope my
questions make sense.

My variables are: id, timetoevent (in days), event, and waittime.

In my study, every patient has their own start time or time 0.  At
time 0, each patient is put on a wait-list.  Some patients may still
be waiting when the 'event' happens (they should be censored).
However, some patients may come off of the wait list and receive
services.  Thus, some patients are technically 'not waiting' for a
period of time before the event occurs.  So, as I understand,
'waittime' is a time-varying covariate.

I am interested in examining the effect of a patient's wait-time on
the outcome (whether the event occurs). In other words, what is the
probability of the event occuring for patient X, given their waiting
time. Right now my data is currently set up where each participant has
their own row of data.

ID      Timetoevent     Event   Waittime
1       50                      0       40
2       73                      0       8
3       150                    1       100

I'm not sure how to best arrange the data in STATA since I'm confused
as to how to put this data in the wide-person format since I don't
have an Event1, Event2, Event3 or a Wait1 Wait2 Wait3..  I have tried
creating a vector using loop commands (to give me an Event1 Event2
Event3) but I need to specify the number of variables I create (max
390 days of observation) and this is different for each patient..

Any help or steps in the right direction would be appreciated! Thank you


*   For searches and help try:

© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index