[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

From |
wgould@stata.com (William Gould, StataCorp LP) |

To |
statalist@hsphsun2.harvard.edu |

Subject |
Re: st: Generating time-varying covariates in a multiple spell data |

Date |
Tue, 08 Apr 2008 11:58:09 -0500 |

Marjo Pyy-Martikainen <Marjo.Pyy-Martikainen@stat.fi> writes, > I have a data containing multiple spells per person. The spells are measured > in months. The data is in the following form: > > PERSON BEGIN END EVENT DUR > 1 ( 0 1 ] 1 1 > 1 ( 4 13 ] 1 9 > 2 ( 15 5 ] 0 10 > > where variables BEGIN and END are measured in calendar time (1 refers to Jan > 1995, 2 to Feb 1995 and so on until 60, Dec 1999). > > I stset the data in the following way: > > . stset end, failure(event) time0(begin) exit(time .) origin(time begin) > > which means I want to "set the clock to zero" at the start of each spell. > Now I would like to include a dummy for December months 12, 24, 36 and 48. > It is thus a time-varying variable getting value 1 for the December months > and 0 for other months. A spell may include zero, one or many December > months. I suppose I should use stsplit and do some kind of episode > splitting, but could someone help me and give me advice how I should do it > with my data? I have the solution. Before starting, let's look and see what Marjo Marjo has already done. At first I thought Marjo had made a mistake, but I was srong. The -stset- command is just complicated enough that theoretical examination does not work well; you can check that you have the intended result by listing the _t0, _t1, and _d variables that -stset- creates. So I entered the data and typed the -stset- command. Then I typed . list _t0 _t _d +---------------+ | _t0 _t _d | |---------------| 1. | 0 1 1 | 2. | 0 9 1 | 3. | 0 10 0 | +---------------+ Analysis time ranges over (0,1] and again over (0,9] for the first person. That's what I thought would happen, and it looks like an error, but notice that Marjo said, "which means I want to set the clock to zero at the start of each spell". Okay, the command works exactly as Marjo said it would. Marjo now wants to add a dummy variable equal to 1 every December. Without explanation (that's coming), here's the solution: . gen recid = _n (1) . stset end, id(person) failure(event) enter(begin) /// (2) exit(time .) time0(begin) . stsplit bot, at(12 13 24 25 36 38 48 49) (3) . gen dummy = ( mod(bot,12)==0 & bot!=0 ) (4) . stset end, id(recid) failure(event) time0(begin) /// (5) exit(time .) origin(time begin) I admit that the entire solution did not occur to me at the out. In fact, I went back and added first line at the end, and modified the fifth. Here is what did occur to me: We will have to use -stsplit-. -stplit- wants to split on analysis time, so we will have first to -stset- our data based on calendar time, then -stsplit- the data, and finally we can -stset- our data the way we really want it. The preliminary -stset- would allow us to generate the dummy variable for December. So let me explain. Ignore line (1); remember, it didn't even occur to me until later. Line (2) was the first line I wrote. It seemed the right way to -stset- the data based on calendar time. I didn't get the command right the first time, but after typing (2), I listed the data, saw what was wrong, and eventually got (2) to work just as I wanted it. (What was wrong is that I forgot exit(time .) because this data, it turned out, had to be treated as multiple-failure data at this step. When I say listed the data, what I do is list _t0, _t, and _d, so I can the time variables and outcome that will be used in analysis. Here's what the data looked like after (2): . list person _t0 _t _d +------------------------+ | person _t0 _t _d | |------------------------| 1. | 1 0 1 1 | 2. | 1 4 13 1 | 3. | 2 15 25 0 | +------------------------+ Pefect; _t0 and _t correspond to the original month variables. Now we can -stsplit-. We need to set the dummy to 1 for months 12, 24, 36, and 48, which means we need to set it back to 0 for months 13, 25, 37, and 49. So I -stsplit- the data as 12, 13, 24, 25, 36, 37, 48, and 49 and created the dummy variable. I checked results after executing commands (3) and (4): . list person dummy _t0 _t _d +--------------------------------+ | person dummy _t0 _t _d | |--------------------------------| 1. | 1 0 0 1 1 | 2. | 1 0 4 12 0 | 3. | 1 1 12 13 1 | 4. | 2 0 15 24 0 | 5. | 2 1 24 25 0 | +--------------------------------+ Actually, I check results after command (3), and I created the dummy more inefficiently (using two commands) on my first take, but that's irrelevant. We have what we want in terms of how the data are split. Now we need to reset analysis time to be as we really want it. So first, I just typed the original -stset- command Marjo supplied, > . stset end, failure(event) time0(begin) exit(time .) origin(time begin) I listed the data, but that didn't work. What I found was that the original second record, calendar time (4,13] and desired analysis time (0,9] was now itself split into two parts, and analysis time got reset on the second part. Well, of course. Marjo was treating this data as single-record survival data, but after the -stsplit-, what was single record data was no longer. So I went back and added command (1), and then I could set what were (but are no longer) single records by specifying id(recnum). That worked. Here was the final result: . list person dummy _t0 _t _d +--------------------------------+ | person dummy _t0 _t _d | |--------------------------------| 1. | 1 0 0 1 1 | 2. | 1 0 4 12 0 | 3. | 1 1 12 13 1 | 4. | 2 0 15 24 0 | 5. | 2 1 24 25 0 | +--------------------------------+ I think that's what Marjo wants. I admit that this was a conceptually difficult problem, so let me emphasize two things: First, to achieve a desired result, you can -stset- the data one way, and then later -stset- the data differently for analysis. That was the insight that had not occurred to Marjo. It is a trick worth remembering whenever working with data where you want some variables defined on one time scale (say months) and others on another (say analysis time). -stset- based on calendar months, create what you want, and then -stset- the data the way you really want it. The rest was just work. I admit that I seldom get an -stset- command right the first time. My technique is to guess and list. Looking at the result, I go back and improve my guess, and eventually I get it right. -- Bill wgould@stata.com * * For searches and help try: * http://www.stata.com/support/faqs/res/findit.html * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

**Follow-Ups**:**Re(2): st: Generating time-varying covariates in multiple spell data***From:*"Pyy-Martikainen Marjo" <Marjo.Pyy-Martikainen@stat.fi>

- Prev by Date:
**Re: st: RE: ST IPW with OLOGIT** - Next by Date:
**st: Case Sensitive filenames and filepaths** - Previous by thread:
**st: RE: Generating time-varying covariates in a multiple spell data** - Next by thread:
**Re(2): st: Generating time-varying covariates in multiple spell data** - Index(es):

© Copyright 1996–2014 StataCorp LP | Terms of use | Privacy | Contact us | What's new | Site index |