Notice: On March 31, it was **announced** that Statalist is moving from an email list to a **forum**. The old list will shut down on April 23, and its replacement, **statalist.org** is already up and running.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

From |
Rebecca Pope <rebecca.a.pope@gmail.com> |

To |
statalist@hsphsun2.harvard.edu |

Subject |
Re: st: sequential subscript processing |

Date |
Fri, 29 Mar 2013 09:54:22 -0500 |

Nick, Thanks for the additional information about single loops. I read the tip and rewrote the code using that approach. You are right about it running slower. I will note one advantage though: just looking at the two, I think the logic of the single loop code is easier to follow. Regards, Rebecca On Wed, Mar 27, 2013 at 8:02 PM, Nick Cox <njcoxstata@gmail.com> wrote: > I've looked through this code. My only strategic suggestion is that it > might be simplified if you had a single loop over observations and > (naturally) had an inbuilt check that each observation referred to the > same id as the previous. But then a single loop over observations can > be notoriously slow and you are trying to avoid that. > > That is, your problem seemed similar in some ways to those discussed in > > SJ-7-3 pr0033 . . . . . . . . . . . . . . Stata tip 51: Events in intervals > . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . N. J. Cox > Q3/07 SJ 7(3):440--443 (no commands) > tip for counting or summarizing irregularly spaced > events in intervals > > > > On Wed, Mar 27, 2013 at 2:04 PM, Rebecca Pope <rebecca.a.pope@gmail.com> wrote: >> This is a question about efficiency. The code I've written produces >> the output I need; it just seems to me that it could be improved. >> >> Rather than comparing obs[3] to obs[2], I need to compare obs[3] to >> obs[2] _after_ obs[2] has been conditionally changed by the values in >> obs[1]. For context, the goal is to "chain" prescription refills >> together to calculate 180-day medication possession ratios. Everyone >> in the data has at least one refill. For any of you who work with >> MPRs, don't panic: this isn't the extent of the calculation or the >> rules. I'm using "refill" loosely; it includes titrations. The goal >> with this example was to capture the essential issue with the dates. >> >> Definitions: >> "dispensing date" - date the pharmacy provides the medication to the patient >> "fill" - a distinct dispensing date+medication combination >> "refill date" - when the medication is projected to be filled again >> "days supply" - the number of days for which the prescription provides >> medication (usually 30, 60, or 90) >> >> The rules are: >> 1. If a patient's refill overlaps the previous fill by more than 20% >> of the previous fill's days supply, replace the current observation's >> dispensing date with the previous fill's dispensing date, adjust the >> days supplied for the current observation to (days supplied(t-1) + >> days supplied(t)) less the number of days of overlap. I.e. truncate >> the previous fill's days supplied & assume use of the refill starts on >> the day it is dispensed. >> >> 2. If a patient's refill overlaps the previous fill by <= 20% of the >> previous fill's days supply, replace the current observation's >> dispensing date with the previous fill's dispensing date, adjust the >> days supplied for the current observation to (days supplied(t-1) + >> days supplied(t)). I.e. shift dispensing date of refill to the end of >> the previous fill. >> >> I think I've got a good start on this with -forvalues- and -while-. >> I've put a sample of the data below. As a note, this data has been >> de-identifed before posting. The dates have been jittered from the >> real dates, but I've replicated all of the major features. The >> variable "ptdrugid" was created from -egen ptdrugid = group(ptid >> shortnm)-. >> >> ** begin code ** >> clear >> input ptdrugid _dispdt daysuppl >> 14 18000 30 >> 14 18031 30 >> 14 18128 30 >> 15 16877 30 >> 15 16903 30 >> 15 16952 30 >> 15 16987 30 >> 15 17010 30 >> 15 17047 30 >> 15 17073 30 >> 15 17093 30 >> 15 17132 30 >> 15 17165 30 >> 15 17194 30 >> 15 17224 30 >> 15 17249 30 >> 15 17286 30 >> 15 17327 30 >> 15 17357 30 >> 15 17385 30 >> 15 17413 30 >> 15 17445 30 >> 15 17474 30 >> 15 17500 30 >> 15 17534 30 >> 15 17568 30 >> 15 17597 30 >> 15 17620 30 >> 15 17645 30 >> 15 17669 30 >> 15 17702 30 >> 15 17728 30 >> 15 17758 30 >> 15 17796 30 >> 15 17818 30 >> 15 17861 30 >> 15 17898 30 >> 15 17934 30 >> 15 17934 10 >> 15 17952 30 >> 15 17971 30 >> 15 18002 30 >> 15 18032 30 >> 15 18075 30 >> 15 18096 30 >> 15 18107 90 >> 15 18190 90 >> end >> gen _refilldt = _dispdt+daysuppl-1 >> format _dispdt _refilldt %td >> clonevar dispdt = _dispdt >> clonevar refilldt = _refilldt >> bys ptdrugid (_dispdt _refilldt): gen _seq = _n >> sum _seq, meanonly >> local nmax = `r(max)' >> gen chng = 0 >> clonevar mdaysup = daysuppl >> forvalues j = 2/`nmax' { >> by ptdrugid: replace chng = (dispdt - refilldt[_n-1]) <= 0 & _n==`j' >> by ptdrugid: replace mdaysup = mdaysup[_n-1] + mdaysup + /// >> (dispdt-refilldt[_n-1])*(abs(dispdt-refilldt[_n-1]) > >> 0.2*mdaysup[_n-1]) if chng >> by ptdrugid: replace dispdt = dispdt[_n-1] if chng >> replace refilldt = dispdt + mdaysup - 1 >> by ptdrugid: drop if chng[_n+1]==1 >> by ptdrugid: replace chng = (dispdt-refilldt[_n-1]) <= 0 & _n==`j' >> sum chng, meanonly >> if `r(sum)' > 0 { >> local x 1 >> while `x' > 0 { >> by ptdrugid: replace mdaysup = mdaysup[_n-1] + mdaysup + /// >> (dispdt-refilldt[_n-1])*(abs(dispdt - refilldt[_n-1]) > >> 0.2*mdaysup[_n-1]) /// >> if chng >> by ptdrugid: replace dispdt = dispdt[_n-1] if chng >> replace refilldt = dispdt + mdaysup - 1 >> by ptdrugid: drop if chng[_n+1]==1 >> by ptdrugid: replace chng = (dispdt -refilldt[_n-1]) <= 0 & _n==`j' >> sum chng, meanonly >> local x = `r(sum)' >> } >> } >> } >> exit >> ** end code ** >> >> To my way of thinking, this is horribly inefficient. Among the issues >> that are immediately apparent to me: (1) once `nmax' has been set, it >> isn't altered despite the fact that the number of observations winds >> up being fall smaller as fills are chained (too many attempts at the >> loop) and (2) I continue making loops over observations once they've >> been maximally condensed. >> >> Does anyone have any suggestions for making this code better? >> >> Thanks, >> Rebecca >> * >> * For searches and help try: >> * http://www.stata.com/help.cgi?search >> * http://www.stata.com/support/faqs/resources/statalist-faq/ >> * http://www.ats.ucla.edu/stat/stata/ > * > * For searches and help try: > * http://www.stata.com/help.cgi?search > * http://www.stata.com/support/faqs/resources/statalist-faq/ > * http://www.ats.ucla.edu/stat/stata/ * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/faqs/resources/statalist-faq/ * http://www.ats.ucla.edu/stat/stata/

**References**:**st: sequential subscript processing***From:*Rebecca Pope <rebecca.a.pope@gmail.com>

**Re: st: sequential subscript processing***From:*Nick Cox <njcoxstata@gmail.com>

- Prev by Date:
**Re: st: Merging Time-Invariant Characteristics Into A Panel Dataset** - Next by Date:
**st: Subtracting in Variable by a Group** - Previous by thread:
**Re: st: sequential subscript processing** - Next by thread:
**st: How to put max and min values in a loop** - Index(es):