From
Rebecca Pope <rebecca.a.pope@gmail.com>

To
statalist@hsphsun2.harvard.edu

Subject
st: sequential subscript processing

Date
Wed, 27 Mar 2013 09:04:16 -0500

This is a question about efficiency. The code I've written produces the output I need; it just seems to me that it could be improved. Rather than comparing obs[3] to obs[2], I need to compare obs[3] to obs[2] _after_ obs[2] has been conditionally changed by the values in obs[1]. For context, the goal is to "chain" prescription refills together to calculate 180-day medication possession ratios. Everyone in the data has at least one refill. For any of you who work with MPRs, don't panic: this isn't the extent of the calculation or the rules. I'm using "refill" loosely; it includes titrations. The goal with this example was to capture the essential issue with the dates. Definitions: "dispensing date" - date the pharmacy provides the medication to the patient "fill" - a distinct dispensing date+medication combination "refill date" - when the medication is projected to be filled again "days supply" - the number of days for which the prescription provides medication (usually 30, 60, or 90) The rules are: 1. If a patient's refill overlaps the previous fill by more than 20% of the previous fill's days supply, replace the current observation's dispensing date with the previous fill's dispensing date, adjust the days supplied for the current observation to (days supplied(t-1) + days supplied(t)) less the number of days of overlap. I.e. truncate the previous fill's days supplied & assume use of the refill starts on the day it is dispensed. 2. If a patient's refill overlaps the previous fill by <= 20% of the previous fill's days supply, replace the current observation's dispensing date with the previous fill's dispensing date, adjust the days supplied for the current observation to (days supplied(t-1) + days supplied(t)). I.e. shift dispensing date of refill to the end of the previous fill. I think I've got a good start on this with -forvalues- and -while-. I've put a sample of the data below. As a note, this data has been de-identifed before posting. The dates have been jittered from the real dates, but I've replicated all of the major features. The variable "ptdrugid" was created from -egen ptdrugid = group(ptid shortnm)-. ** begin code ** clear input ptdrugid _dispdt daysuppl 14 18000 30 14 18031 30 14 18128 30 15 16877 30 15 16903 30 15 16952 30 15 16987 30 15 17010 30 15 17047 30 15 17073 30 15 17093 30 15 17132 30 15 17165 30 15 17194 30 15 17224 30 15 17249 30 15 17286 30 15 17327 30 15 17357 30 15 17385 30 15 17413 30 15 17445 30 15 17474 30 15 17500 30 15 17534 30 15 17568 30 15 17597 30 15 17620 30 15 17645 30 15 17669 30 15 17702 30 15 17728 30 15 17758 30 15 17796 30 15 17818 30 15 17861 30 15 17898 30 15 17934 30 15 17934 10 15 17952 30 15 17971 30 15 18002 30 15 18032 30 15 18075 30 15 18096 30 15 18107 90 15 18190 90 end gen _refilldt = _dispdt+daysuppl-1 format _dispdt _refilldt %td clonevar dispdt = _dispdt clonevar refilldt = _refilldt bys ptdrugid (_dispdt _refilldt): gen _seq = _n sum _seq, meanonly local nmax = `r(max)' gen chng = 0 clonevar mdaysup = daysuppl forvalues j = 2/`nmax' { by ptdrugid: replace chng = (dispdt - refilldt[_n-1]) <= 0 & _n==`j' by ptdrugid: replace mdaysup = mdaysup[_n-1] + mdaysup + /// (dispdt-refilldt[_n-1])*(abs(dispdt-refilldt[_n-1]) > 0.2*mdaysup[_n-1]) if chng by ptdrugid: replace dispdt = dispdt[_n-1] if chng replace refilldt = dispdt + mdaysup - 1 by ptdrugid: drop if chng[_n+1]==1 by ptdrugid: replace chng = (dispdt-refilldt[_n-1]) <= 0 & _n==`j' sum chng, meanonly if `r(sum)' > 0 { local x 1 while `x' > 0 { by ptdrugid: replace mdaysup = mdaysup[_n-1] + mdaysup + /// (dispdt-refilldt[_n-1])*(abs(dispdt - refilldt[_n-1]) > 0.2*mdaysup[_n-1]) /// if chng by ptdrugid: replace dispdt = dispdt[_n-1] if chng replace refilldt = dispdt + mdaysup - 1 by ptdrugid: drop if chng[_n+1]==1 by ptdrugid: replace chng = (dispdt -refilldt[_n-1]) <= 0 & _n==`j' sum chng, meanonly local x = `r(sum)' } } } exit ** end code ** To my way of thinking, this is horribly inefficient. Among the issues that are immediately apparent to me: (1) once `nmax' has been set, it isn't altered despite the fact that the number of observations winds up being fall smaller as fills are chained (too many attempts at the loop) and (2) I continue making loops over observations once they've been maximally condensed. Does anyone have any suggestions for making this code better? Thanks, Rebecca * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/faqs/resources/statalist-faq/ * http://www.ats.ucla.edu/stat/stata/

