Stata The Stata listserver
[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

st: RE: Survival time to prevalence data - efficient code?


From   "Nick Cox" <[email protected]>
To   <[email protected]>
Subject   st: RE: Survival time to prevalence data - efficient code?
Date   Mon, 8 Sep 2003 23:37:33 +0100

Arne Kolstad
 
> I have survival time data about sickness spells, in the 
> following form:
> 
> personid     startdate     stopdate
> 1            01mai1997    07dec1997
> 1            28jan2002    09feb2002
> 2            31jul1994    06mar1998
> .
> .
> N            31dec2002    (sensored)
> 
> What I need is a table a) with prevalences for each day :
> 
> month            spersons
> 01jan1994            897
> 02jan1994            789
> .
> .
> 31dec2002            987
> 
> ---
> 
> and a table b) of person-days of sickness for each month 
> through the period
> of interest:
> 
> 
> month              pdays
> jan1994            22345
> feb1994            24567
> .
> .
> dec2002            26789
> 
> ---
> 
> 
> I believe I will have my a) data set thusly:
> 
> forvalues x=12419/15705 {
> quietly stdes if startdate<=`x' & stopdate>`x'
> di r[N_sub]
> }
> 
> So to the real problem: The data set has more than 5 
> million records.
> Looping through thousands of days is slow, partly because 
> stdes doea a lot
> of work, and I need to repeat it a lot of times as 
> different versions of the
> data are produced. Is there a more efficient method?
> 
> >From table a) to table b) should be straightforward, but 
> is there a really
> efficient code hidden somewhere among the st commands or elsewhere?

Don't loop! 
 
What a neat problem! I don't know about -st-, but here 
is one first principles attack: 

/// get your data in long form and -sort-ed on date: 

rename startdate date1
rename stopdate date2
reshape long date , i(personid)
sort date 

// the number of persons who are sick increases by 1
// every time someone goes on sick leave and decreases
// by 1 every time some one stops 

gen spersons = sum((_j == 1) - (_j == 2)) 

// reduce to one observation daily 

bysort date : keep if _n == _N 

// fill in gaps 

gen lag = date[_n+1] - date 
expand lag
bysort date : replace date = date[_n-1] + 1 if _n > 1

// listing 
l date spersons 

// monthlt summary 
gen month = mofd(date) 
egen Spersons = sum(spersons), by(month) 
tabdisp month, c(Spersons) 

(plus some adjustment dependent on how censoring is 
done?) 
               
Nick 
[email protected] 
*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2024 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index