Notice: On March 31, it was **announced** that Statalist is moving from an email list to a **forum**. The old list will shut down at the end of May, and its replacement, **statalist.org** is already up and running.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

From |
Nick Cox <njcoxstata@gmail.com> |

To |
statalist@hsphsun2.harvard.edu |

Subject |
Re: st: sequential subscript processing |

Date |
Thu, 28 Mar 2013 01:02:18 +0000 |

I've looked through this code. My only strategic suggestion is that it might be simplified if you had a single loop over observations and (naturally) had an inbuilt check that each observation referred to the same id as the previous. But then a single loop over observations can be notoriously slow and you are trying to avoid that. That is, your problem seemed similar in some ways to those discussed in SJ-7-3 pr0033 . . . . . . . . . . . . . . Stata tip 51: Events in intervals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . N. J. Cox Q3/07 SJ 7(3):440--443 (no commands) tip for counting or summarizing irregularly spaced events in intervals On Wed, Mar 27, 2013 at 2:04 PM, Rebecca Pope <rebecca.a.pope@gmail.com> wrote: > This is a question about efficiency. The code I've written produces > the output I need; it just seems to me that it could be improved. > > Rather than comparing obs[3] to obs[2], I need to compare obs[3] to > obs[2] _after_ obs[2] has been conditionally changed by the values in > obs[1]. For context, the goal is to "chain" prescription refills > together to calculate 180-day medication possession ratios. Everyone > in the data has at least one refill. For any of you who work with > MPRs, don't panic: this isn't the extent of the calculation or the > rules. I'm using "refill" loosely; it includes titrations. The goal > with this example was to capture the essential issue with the dates. > > Definitions: > "dispensing date" - date the pharmacy provides the medication to the patient > "fill" - a distinct dispensing date+medication combination > "refill date" - when the medication is projected to be filled again > "days supply" - the number of days for which the prescription provides > medication (usually 30, 60, or 90) > > The rules are: > 1. If a patient's refill overlaps the previous fill by more than 20% > of the previous fill's days supply, replace the current observation's > dispensing date with the previous fill's dispensing date, adjust the > days supplied for the current observation to (days supplied(t-1) + > days supplied(t)) less the number of days of overlap. I.e. truncate > the previous fill's days supplied & assume use of the refill starts on > the day it is dispensed. > > 2. If a patient's refill overlaps the previous fill by <= 20% of the > previous fill's days supply, replace the current observation's > dispensing date with the previous fill's dispensing date, adjust the > days supplied for the current observation to (days supplied(t-1) + > days supplied(t)). I.e. shift dispensing date of refill to the end of > the previous fill. > > I think I've got a good start on this with -forvalues- and -while-. > I've put a sample of the data below. As a note, this data has been > de-identifed before posting. The dates have been jittered from the > real dates, but I've replicated all of the major features. The > variable "ptdrugid" was created from -egen ptdrugid = group(ptid > shortnm)-. > > ** begin code ** > clear > input ptdrugid _dispdt daysuppl > 14 18000 30 > 14 18031 30 > 14 18128 30 > 15 16877 30 > 15 16903 30 > 15 16952 30 > 15 16987 30 > 15 17010 30 > 15 17047 30 > 15 17073 30 > 15 17093 30 > 15 17132 30 > 15 17165 30 > 15 17194 30 > 15 17224 30 > 15 17249 30 > 15 17286 30 > 15 17327 30 > 15 17357 30 > 15 17385 30 > 15 17413 30 > 15 17445 30 > 15 17474 30 > 15 17500 30 > 15 17534 30 > 15 17568 30 > 15 17597 30 > 15 17620 30 > 15 17645 30 > 15 17669 30 > 15 17702 30 > 15 17728 30 > 15 17758 30 > 15 17796 30 > 15 17818 30 > 15 17861 30 > 15 17898 30 > 15 17934 30 > 15 17934 10 > 15 17952 30 > 15 17971 30 > 15 18002 30 > 15 18032 30 > 15 18075 30 > 15 18096 30 > 15 18107 90 > 15 18190 90 > end > gen _refilldt = _dispdt+daysuppl-1 > format _dispdt _refilldt %td > clonevar dispdt = _dispdt > clonevar refilldt = _refilldt > bys ptdrugid (_dispdt _refilldt): gen _seq = _n > sum _seq, meanonly > local nmax = `r(max)' > gen chng = 0 > clonevar mdaysup = daysuppl > forvalues j = 2/`nmax' { > by ptdrugid: replace chng = (dispdt - refilldt[_n-1]) <= 0 & _n==`j' > by ptdrugid: replace mdaysup = mdaysup[_n-1] + mdaysup + /// > (dispdt-refilldt[_n-1])*(abs(dispdt-refilldt[_n-1]) > > 0.2*mdaysup[_n-1]) if chng > by ptdrugid: replace dispdt = dispdt[_n-1] if chng > replace refilldt = dispdt + mdaysup - 1 > by ptdrugid: drop if chng[_n+1]==1 > by ptdrugid: replace chng = (dispdt-refilldt[_n-1]) <= 0 & _n==`j' > sum chng, meanonly > if `r(sum)' > 0 { > local x 1 > while `x' > 0 { > by ptdrugid: replace mdaysup = mdaysup[_n-1] + mdaysup + /// > (dispdt-refilldt[_n-1])*(abs(dispdt - refilldt[_n-1]) > > 0.2*mdaysup[_n-1]) /// > if chng > by ptdrugid: replace dispdt = dispdt[_n-1] if chng > replace refilldt = dispdt + mdaysup - 1 > by ptdrugid: drop if chng[_n+1]==1 > by ptdrugid: replace chng = (dispdt -refilldt[_n-1]) <= 0 & _n==`j' > sum chng, meanonly > local x = `r(sum)' > } > } > } > exit > ** end code ** > > To my way of thinking, this is horribly inefficient. Among the issues > that are immediately apparent to me: (1) once `nmax' has been set, it > isn't altered despite the fact that the number of observations winds > up being fall smaller as fills are chained (too many attempts at the > loop) and (2) I continue making loops over observations once they've > been maximally condensed. > > Does anyone have any suggestions for making this code better? > > Thanks, > Rebecca > * > * For searches and help try: > * http://www.stata.com/help.cgi?search > * http://www.stata.com/support/faqs/resources/statalist-faq/ > * http://www.ats.ucla.edu/stat/stata/ * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/faqs/resources/statalist-faq/ * http://www.ats.ucla.edu/stat/stata/

**Follow-Ups**:**Re: st: sequential subscript processing***From:*Rebecca Pope <rebecca.a.pope@gmail.com>

**References**:**st: sequential subscript processing***From:*Rebecca Pope <rebecca.a.pope@gmail.com>

- Prev by Date:
**Re: st: How to put max and min values in a loop** - Next by Date:
**st: generate age variable from year and month of birth and date of the survey** - Previous by thread:
**st: sequential subscript processing** - Next by thread:
**Re: st: sequential subscript processing** - Index(es):