[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

Re: st: Using sts list results in do-files

From   James Harris <>
Subject   Re: st: Using sts list results in do-files
Date   Wed, 11 Jul 2007 16:27:58 +1200

James Harris wrote:
> I'm running some simple survival analyses using
>    sts list, failure at (1 12) by (gender ageband)
> My question is, can I access the results to use later in my do-file?
> For example, how would I pick out failure rate at time 12, and put it
> into a variable or matrix?
> Many thanks.

I'm replying to my own question, as all I got from StataList was "try
sts generate".  I had a useful correspondance with someone who'd asked
StataList this question earlier, and then in turn have been contacted by
others who are dealing with the same problem.

The solution I've found works in some  - but not all - cases. If anyone
is interested in generalising it, feel free!

- James.

The first method uses -tsset- to create the missing observations; then
carries forward the survival rates:

     I'm sorry to say that I did not [get an answer from StataList],
     which is unfortunate. I can tell you
     how I addressed the problem since I had to get an answer .. I figured
     that, since the failure function is not a model and is essentially
     calculated at each particular failure time, I assumed that the failure
     function didn't change between successive timepoints. So, essentially,
     if I had a time 2 with failure rate A, and at the next timepoint 8 I
     have A+x, I assume my failure rate is A for time 3,4,5,6, and 7. It is
     not until timepoint 8 that I had the failure change. Likewise, if all
     my obs for one group were censored at time 10, and for my other group
     they were censored at time 22, the best estimate for my first group's
     failure function remains constant between time 10 and 22.

     Whether or not this line of reasoning applies to your problem, I can't
     say. But that's what I did, because having talked with a few of my
     colleagues it seemed reasonable in my case.

     To implement this, I did a tsset and tsfill (to create an observation
     for all missing timepoints
     3-7, and filled the data accordingly, with a line something like the

     . bysort [grouping-variables] (_t): replace failure=failure[_n-1] if
     missing(failure) & ~missing(failure[_n-1])

The tsset method almost worked for me, except that it creates too many
observations: I am working with a million or so hospitalisation records,
that span 15 years, with a timescale of 1 day, and my aged computer
simply ran out of memory. But the basic idea of carrying forward the
survival rates to dummy records seemed like the way to go.

So instead I created a file of observations that holds 1 record for each
person for each time that I want the survival rate for:

// set up the demographics file, with entries at the times of interest

use ".\stata data\temp\demographics", clear
// a file with a record for each unique person (identified by mast_enc)

local BridgeDuration = 30
local DH_end = 366
local Yr2_end = 732 // two years

gen time1 = `BridgeDuration' // days
gen time2 = `DH_end'
gen time3 = `Yr2_end'

reshape long time, i(mast_enc) j(break)
save ".\stata data\temp\demographics30", replace

After the stset on my main dataset, I generate the survival parameter,
and the same time variable as in my 'demographics' file

sts generate survival = s
format survival %6.4g
gen time = _t

I append the 'demographics'  file to my main dataset, sort by time, then
carryforward the survival rates:

append using ".\stata data\temp\demographics30"

sort mast_enc time _t // the sort by _t ensures that eg 30.x days goes
after 30.0 days
// mast_enc is the unique person ID

// at 30 days, use earlier survival if there is one for this person
// also carry forward the _st flag
// and the ageband
// unless previous record is also for a t=30

foreach BreakPoint in `BridgeDuration' `DH_end' `Yr2_end' /// the three
breakpoints: end of bridge; rest of 1st year; second year
     bysort mast_enc (time): replace _st = _st[_n-1] ///
          if (missing(_st) & (time == `BreakPoint') & (time[_n-1] ~=
`BreakPoint') ///
                 & (mast_enc == mast_enc[_n-1]))

     bysort mast_enc (time): replace survival = survival[_n-1] ///
          if ( missing(survival) & (time == `BreakPoint') & (time[_n-1]
~= `BreakPoint') ///
                  & (mast_enc == mast_enc[_n-1]))

     //bysort mast_enc (time): replace ageband = ageband[_n-1] ///
     //     if ( (time == `BreakPoint') & (time[_n-1] ~= `BreakPoint') &
(mast_enc == mast_enc[_n-1]))

Each person in the dataset now has a survival rate at the three
specified times. It can be tabluated with something like

foreach BreakPoint in `BridgeDuration' `DH_end' `Yr2_end' {
     table gender ageband  if time == `BreakPoint', contents (min
survival freq)

if survival is decreasing over time.  The 'min' function is needed if
some people actually have an event at one of my three times of
interest.  Alternatively, you can create a dataset of the survival
values with something like

     collapse (min) survival if inlist(time, `BridgeDuration' ,`DH_end'
     ,`Yr2_end') ,  by (ageband gender time )

The good news is that this method works well when each person only
becomes at risk once - in my case, a chance of death after a heart attack.

The bad news is that it fails dismally when each person may have
multiple entries and failure events - eg when I  want the chance of a
heart attack given a previous heart attack. I've given up on solving
that version - if anyone has any suggestions I'd be very grateful.

*   For searches and help try:

© Copyright 1996–2017 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index