Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down on April 23, and its replacement, statalist.org is already up and running.

# st: sequential subscript processing

 From Rebecca Pope To statalist@hsphsun2.harvard.edu Subject st: sequential subscript processing Date Wed, 27 Mar 2013 09:04:16 -0500

```This is a question about efficiency. The code I've written produces
the output I need; it just seems to me that it could be improved.

Rather than comparing obs[3] to obs[2], I need to compare obs[3] to
obs[2] _after_ obs[2] has been conditionally changed by the values in
obs[1]. For context, the goal is to "chain" prescription refills
together to calculate 180-day medication possession ratios. Everyone
in the data has at least one refill. For any of you who work with
MPRs, don't panic: this isn't the extent of the calculation or the
rules. I'm using "refill" loosely; it includes titrations. The goal
with this example was to capture the essential issue with the dates.

Definitions:
"dispensing date" - date the pharmacy provides the medication to the patient
"fill" - a distinct dispensing date+medication combination
"refill date" - when the medication is projected to be filled again
"days supply" - the number of days for which the prescription provides
medication (usually 30, 60, or 90)

The rules are:
1. If a patient's refill overlaps the previous fill by more than 20%
of the previous fill's days supply, replace the current observation's
dispensing date with the previous fill's dispensing date, adjust the
days supplied for the current observation to (days supplied(t-1) +
days supplied(t)) less the number of days of overlap. I.e. truncate
the previous fill's days supplied & assume use of the refill starts on
the day it is dispensed.

2. If a patient's refill overlaps the previous fill by <= 20% of the
previous fill's days supply, replace the current observation's
dispensing date with the previous fill's dispensing date, adjust the
days supplied for the current observation to (days supplied(t-1) +
days supplied(t)). I.e. shift dispensing date of refill to the end of
the previous fill.

I think I've got a good start on this with -forvalues- and -while-.
I've put a sample of the data below. As a note, this data has been
de-identifed before posting. The dates have been jittered from the
real dates, but I've replicated all of the major features. The
variable "ptdrugid" was created from -egen ptdrugid = group(ptid
shortnm)-.

** begin code **
clear
input    ptdrugid   _dispdt   daysuppl
14     18000         30
14     18031         30
14     18128         30
15     16877         30
15     16903         30
15     16952         30
15     16987         30
15     17010         30
15     17047         30
15     17073         30
15     17093         30
15     17132         30
15     17165         30
15     17194         30
15     17224         30
15     17249         30
15     17286         30
15     17327         30
15     17357         30
15     17385         30
15     17413         30
15     17445         30
15     17474         30
15     17500         30
15     17534         30
15     17568         30
15     17597         30
15     17620         30
15     17645         30
15     17669         30
15     17702         30
15     17728         30
15     17758         30
15     17796         30
15     17818         30
15     17861         30
15     17898         30
15     17934         30
15     17934         10
15     17952         30
15     17971         30
15     18002         30
15     18032         30
15     18075         30
15     18096         30
15     18107         90
15     18190         90
end
gen _refilldt = _dispdt+daysuppl-1
format _dispdt _refilldt %td
clonevar dispdt = _dispdt
clonevar refilldt = _refilldt
bys ptdrugid (_dispdt _refilldt): gen _seq = _n
sum _seq, meanonly
local nmax = `r(max)'
gen chng = 0
clonevar mdaysup = daysuppl
forvalues j = 2/`nmax' {
by ptdrugid: replace chng = (dispdt - refilldt[_n-1]) <= 0 & _n==`j'
by ptdrugid: replace mdaysup = mdaysup[_n-1] + mdaysup + ///
(dispdt-refilldt[_n-1])*(abs(dispdt-refilldt[_n-1]) >
0.2*mdaysup[_n-1]) if chng
by ptdrugid: replace dispdt = dispdt[_n-1] if chng
replace refilldt = dispdt + mdaysup - 1
by ptdrugid: drop if chng[_n+1]==1
by ptdrugid: replace chng = (dispdt-refilldt[_n-1]) <= 0 & _n==`j'
sum chng, meanonly
if `r(sum)' > 0 {
local x 1
while `x' > 0 {
by ptdrugid: replace mdaysup = mdaysup[_n-1] + mdaysup + ///
(dispdt-refilldt[_n-1])*(abs(dispdt - refilldt[_n-1]) >
0.2*mdaysup[_n-1]) ///
if chng
by ptdrugid: replace dispdt = dispdt[_n-1] if chng
replace refilldt = dispdt + mdaysup - 1
by ptdrugid: drop if chng[_n+1]==1
by ptdrugid: replace chng = (dispdt -refilldt[_n-1]) <= 0 & _n==`j'
sum chng, meanonly
local x  = `r(sum)'
}
}
}
exit
** end code **

To my way of thinking, this is horribly inefficient. Among the issues
that are immediately apparent to me: (1) once `nmax' has been set, it
isn't altered despite the fact that the number of observations winds
up being fall smaller as fills are chained (too many attempts at the
loop) and (2) I continue making loops over observations once they've
been maximally condensed.

Does anyone have any suggestions for making this code better?

Thanks,
Rebecca
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/
```