Statalist


[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

Re: st: Generating time-varying covariates in a multiple spell data


From   wgould@stata.com (William Gould, StataCorp LP)
To   statalist@hsphsun2.harvard.edu
Subject   Re: st: Generating time-varying covariates in a multiple spell data
Date   Tue, 08 Apr 2008 11:58:09 -0500

Marjo Pyy-Martikainen <Marjo.Pyy-Martikainen@stat.fi> writes, 

> I have a data containing multiple spells per person. The spells are measured
> in months.  The data is in the following form:
>
>       PERSON    BEGIN   END     EVENT     DUR
>            1   (  0      1 ]        1       1
>            1   (  4     13 ]        1       9
>            2   ( 15      5 ]        0      10
>
> where variables BEGIN and END are measured in calendar time (1 refers to Jan
> 1995, 2 to Feb 1995 and so on until 60, Dec 1999).
> 
> I stset the data in the following way:
> 
>   . stset end, failure(event) time0(begin) exit(time .) origin(time begin)
>
> which means I want to "set the clock to zero" at the start of each spell.
> Now I would like to include a dummy for December months 12, 24, 36 and 48.
> It is thus a time-varying variable getting value 1 for the December months
> and 0 for other months. A spell may include zero, one or many December
> months.  I suppose I should use stsplit and do some kind of episode
> splitting, but could someone help me and give me advice how I should do it
> with my data?

I have the solution.  Before starting, let's look and see what Marjo Marjo has
already done.  At first I thought Marjo had made a mistake, but I was srong.
The -stset- command is just complicated enough that theoretical examination
does not work well; you can check that you have the intended result by listing
the _t0, _t1, and _d variables that -stset- creates.  So I entered the data
and typed the -stset- command.  Then I typed

        . list _t0 _t _d

             +---------------+
             | _t0   _t   _d |
             |---------------|
          1. |   0    1    1 |
          2. |   0    9    1 |
          3. |   0   10    0 |
             +---------------+

Analysis time ranges over (0,1] and again over (0,9] for the first person.
That's what I thought would happen, and it looks like an error, but notice 
that Marjo said, "which means I want to set the clock to zero at the start of
each spell".  Okay, the command works exactly as Marjo said it would.

Marjo now wants to add a dummy variable equal to 1 every December.
Without explanation (that's coming), here's the solution:

        . gen recid = _n                                            (1)

        . stset end, id(person) failure(event) enter(begin) ///     (2)
                     exit(time .) time0(begin)
        . stsplit bot, at(12 13  24 25  36 38  48 49)               (3)
        . gen dummy = ( mod(bot,12)==0 & bot!=0 )                   (4)

        . stset end, id(recid) failure(event) time0(begin) ///      (5)
                     exit(time .) origin(time begin)

I admit that the entire solution did not occur to me at the out.  In fact, I
went back and added first line at the end, and modified the fifth.  Here is
what did occur to me:  We will have to use -stsplit-.  -stplit- wants to split
on analysis time, so we will have first to -stset- our data based on calendar
time, then -stsplit- the data, and finally we can -stset- our data the way we
really want it.  The preliminary -stset- would allow us to generate the 
dummy variable for December.

So let me explain.
Ignore line (1); remember, it didn't even occur to me until later. 

Line (2) was the first line I wrote.  It seemed the right way to -stset- 
the data based on calendar time.  I didn't get the command right the 
first time, but after typing (2), I listed the data, saw what was wrong, 
and eventually got (2) to work just as I wanted it.  (What was wrong is that I
forgot exit(time .) because this data, it turned out, had to be treated as
multiple-failure data at this step.  When I say listed the data, what I do is
list _t0, _t, and _d, so I can the time variables and outcome that will be
used in analysis.  Here's what the data looked like after (2):

        . list person _t0 _t _d

             +------------------------+
             | person   _t0   _t   _d |
             |------------------------|
          1. |      1     0    1    1 |
          2. |      1     4   13    1 |
          3. |      2    15   25    0 |
             +------------------------+

Pefect; _t0 and _t correspond to the original month variables. 
Now we can -stsplit-.  We need to set the dummy to 1 for months 12, 24, 36, 
and 48, which means we need to set it back to 0 for months 13, 25, 37, and 
49.  So I -stsplit- the data as 12, 13, 24, 25, 36, 37, 48, and 49 and 
created the dummy variable.  I checked results after executing commands (3) 
and (4):

        . list person dummy _t0 _t _d

             +--------------------------------+
             | person   dummy   _t0   _t   _d |
             |--------------------------------|
          1. |      1       0     0    1    1 |
          2. |      1       0     4   12    0 |
          3. |      1       1    12   13    1 |
          4. |      2       0    15   24    0 |
          5. |      2       1    24   25    0 |
             +--------------------------------+

Actually, I check results after command (3), and I created the dummy 
more inefficiently (using two commands) on my first take, but that's 
irrelevant.  We have what we want in terms of how the data are split.
Now we need to reset analysis time to be as we really want it.  So first, 
I just typed the original -stset- command Marjo supplied, 

>   . stset end, failure(event) time0(begin) exit(time .) origin(time begin)

I listed the data, but that didn't work.  What I found was that 
the original second record, calendar time (4,13] and desired analysis time 
(0,9] was now itself split into two parts, and analysis time got reset 
on the second part.  Well, of course.  Marjo was treating this data as 
single-record survival data, but after the -stsplit-, what was single record 
data was no longer.  So I went back and added command (1), and then 
I could set what were (but are no longer) single records by specifying 
id(recnum).  That worked.  Here was the final result:

        . list person dummy _t0 _t _d

             +--------------------------------+
             | person   dummy   _t0   _t   _d |
             |--------------------------------|
          1. |      1       0     0    1    1 |
          2. |      1       0     4   12    0 |
          3. |      1       1    12   13    1 |
          4. |      2       0    15   24    0 |
          5. |      2       1    24   25    0 |
             +--------------------------------+

I think that's what Marjo wants.

I admit that this was a conceptually difficult problem, so let me emphasize
two things:  First, to achieve a desired result, you can -stset- the data one
way, and then later -stset- the data differently for analysis.  That was the
insight that had not occurred to Marjo.  It is a trick worth remembering
whenever working with data where you want some variables defined on one 
time scale (say months) and others on another (say analysis time). 
-stset- based on calendar months, create what you want, and then -stset- 
the data the way you really want it.

The rest was just work.  I admit that I seldom get an -stset- command 
right the first time.  My technique is to guess and list.  Looking at 
the result, I go back and improve my guess, and eventually I get it 
right.

-- Bill
wgould@stata.com
*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index