[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

RE: st: First instance of starting and stopping drug

From   "Nick Cox" <>
To   <>
Subject   RE: st: First instance of starting and stopping drug
Date   Fri, 14 Mar 2008 16:23:57 -0000

Modulo a reflection of the time axis, this is in part an FAQ. In fact 

. search first date 

in Stata yields this reference

FAQ     . . . . . . . . . . . . . . . . . . . . . . . Generating the
last date
        4/05    How can I generate a variable containing the last of
                several dates?

which despite its title does treat the calculation of first (minimum)

FAQ or not, your question yields to -by:-. 

As you don't say here whether your dates are numeric or string
variables, I take the easier option and assume numeric. Then one way to
get the first start is through -egen-: 

. egen first_start = min(datestart), by(id) 

Another way is from first principles: 

. bysort id (datestart): gen first_start = datestart[1]

The major wrinkle seems to be that your first stop cannot precede your
first start. 
Thus we clone the first stops, but blank out any dates that don't

. gen work = cond(datestop < first_start, . , datestop) 

And then proceed as before 

. egen first_stop = min(work), by(id) 


. bysort id (work): gen firststop = work[1] 

To keep just one observation for each -id-: 

. by id: keep if _n == 1 

For more on the power of -by:-, note that a leisurely tutorial in the
Stata Journal is now in the public domain: 

SJ-2-1  pr0004  . . . . . . . . . . Speaking Stata:  How to move step
by: step
        Q1/02   SJ 2(1):86-102                                   (no
        explains the use of the by varlist : construct to tackle
        a variety of problems with group structure, ranging from
        simple calculations for each of several groups to more
       advanced manipulations that use the built-in _n and _N


for a .pdf version. 


Paul O'Brien

The data are longer than that Svend!

id datestart datestop
1 1stJan01
1                  12thJan02
2                  1stFeb01
2  1stFeb01
2                   1stApr04
2  1stApr04
2                   1stJan07
3  1stJan03
3                   censordate

Two points:
the patient can start drug before she attends our clinic
the patient can stop and start on the same day (it is actually a  
hormonal implant, removed at end of life span and another inserted at  
same visit).

We want to measure the continuation rate for the first episode of  
implant use that we inserted ourselves. Data should look like this

id datestart datestop
1  1st Jan01  12th Jan02
2  1st Feb01 1stApr04
3  1stJan03  censordate

So, we want the first datestart on the same row as the next datestop.

On 3/13/08, Svend Juul <> wrote:
Paul wrote:

We have a database of patients on and off a drug in the long form,
some stopping before starting later. I want to do a survival analysis
on the first instance of starting and stopping use under our care, but
have difficulty isolating the first episode of use.


I assume that long form means something like this:

   input id timeon timeoff
   1 1 3
   1 6 7
   2 1 5
   2 6 9

You want to keep the first treatment period for each id:

   by id (timeon), sort: generate incl = _n==1
   keep if incl==1
   sort id timeon

        | id   timeon   timeoff   incl |
     1. |  1        1         3      1 |
     2. |  2        1         5      1 |

*   For searches and help try:

© Copyright 1996–2019 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index