Notice: On March 31, it was **announced** that Statalist is moving from an email list to a **forum**. The old list will shut down on April 23, and its replacement, **statalist.org** is already up and running.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

From |
"Sarah Edgington" <sedging@ucla.edu> |

To |
<statalist@hsphsun2.harvard.edu> |

Subject |
RE: st: Handling pharmacy data with multiple entries per subject |

Date |
Fri, 10 Jun 2011 16:59:24 -0700 |

Sarah, Do you have a variable that serves as an indicator for each hospitalization? If you do, you could create a variable that is the number of days of the drug for each record and then sum them across lines. Say you have (or create) hosp_id which distinguishes hospitalizations within ID. Then create abdays by subtracting the start date from the stop date (assuming you've already put your dates into date format). Then: bysort ID hosp_id drug_name: egen totab=total(abdays) This ignores your 30 day restriction and just totals the number of drug days for each hospitalization. To incorporate the 30 day restriction you'll have to do some comparisons between the outcome date and the start and stop dates for the drugs when creating the abdays variable. Something like this might work: gen start_window=outcome-30 gen abdays=stop_date - start_date if start_date>=start_window replace abdays=stop_date - start_window if start_date<start_window & stop_date>=start_window If the outcome ever occurs before the stop_date you'd want to take that into account too when creating the antibiotic day measure. But the basic idea is that as long as you can figure out exactly how many days you want to include for each record, you can then sum across records to get the total. The one potentially trickier bit I see here is if you want to count the start and stop dates inclusively. If you want to have 5/19/2001 to 5/20/2001 count as 2 days of taking the drug (rather than, say, treating drug stop dates as the first day the patient doesn't take a drug) you have to add one after subtracting start_date from stop_date. In this case you'd want to adjust for overlapping spells. That is, you don't want to count 5/20 in both line 2 and in line 3. If you sort by start_date and stop_date you can just compare within a hospitalization to see if start_date=stop_date[_n-1] and subtract 1 day if it does. I'm thinking something like this once you've created the abdays variable: sort ID hosp_id drug_name start_date stop_date bysort ID hosp_id drug_name replace abdays=abdays-1 if start_date=stop_date[_n-1] & abdays[_n-1] ~= . The second part of the if statement is just to make sure that any of the days in the previous line were counted. If they weren't because the previous stop_date was out of range, the current line's start date will be out of range too so you won't be counting the day of overlap anyway. I'm assuming here that you want to treat all your possible drugs separately. If not you can just do your calculations sorting on ID and hosp_id. That should get you the numbers you want without having to rearrange your data at all. -Sarah -----Original Message----- From: owner-statalist@hsphsun2.harvard.edu [mailto:owner-statalist@hsphsun2.harvard.edu] On Behalf Of Doernberg, Sarah Sent: Friday, June 10, 2011 3:05 PM To: statalist@hsphsun2.harvard.edu Subject: RE: st: Handling pharmacy data with multiple entries per subject Hi all: Thank you for your responses. I'm attempting to be more specific about my question below. My data is currently formatted as follows (with an example entry): ID drug_name start_date stop_date outcome 1 ceftriaxone 5/15/2001 5/17/2001 5/31/2001 1 ceftriaxone 5/19/2001 5/20/2001 5/31/2001 1 ceftriaxone 5/20/2001 5/24/2001 5/31/2001 1 ceftriaxone 7/24/2001 7/27/2001 . This one person had 3 prescriptions for ceftriaxone during a hospitalization in May, 2001, including one day where the person was not given this drug (5/15-5/17 and 5/19-5/24). There was another hospitalization in July, 2001, where another prescription was given. The patient did not experience the outcome during the second hospitalization. The dataset only contains information about two different drugs. My ultimate goal is to figure out the number of days each person received each drug during a 30-day period from the first day of receipt or before the date of the outcome (if <30 days from the start of antibiotic) to allow for a logistic regression with exposure = antibiotic days. In addition, I may also do a survival analysis using start of antibiotic as the start date, development of the outcome as failure, and censoring at 30 days in those without the outcome. I hope that helps to clarify things. Thank you for your advice. Best, Sarah ________________________________________ From: owner-statalist@hsphsun2.harvard.edu [owner-statalist@hsphsun2.harvard.edu] On Behalf Of Phil Schumm [pschumm@uchicago.edu] Sent: Friday, June 10, 2011 2:00 PM To: statalist@hsphsun2.harvard.edu Subject: Re: st: Handling pharmacy data with multiple entries per subject On Jun 10, 2011, at 2:59 PM, Doernberg, Sarah wrote: > I have a dataset from our pharmacy with prescriptions for antibiotics > in hospitalized patients. Each time a patient was transferred (from > the emergency department to the ward or the ward to the ICU, for > instance), a new prescription (and thus, a new row) was generated. > This is compounded by the fact that some people received intermittent > dosing (each start date with it's own row). > > Because this is a very large set of data, I am trying to figure out > how to have Stata combine the rows. Ideally, I would like to have one > entry per person with consecutive courses of antibiotics represented > by start and stop days (for example, someone who received an > antibiotic from 6/1-6/3 and 6/7-6/9 would have > start_date_1 = 6/1, stop_date_1=6/3 and start_date_2=6/7 and > stop_date_2=6/9). > > I have tried doing this with the collapse command but the best I can > do is to get total days on antibiotic in a given month. Converting > from long to wide also is not ideal because consecutive courses are > not combined due to the multiple prescriptions based on location. You'll have to be a bit more specific here to get the help you're asking for. For example, why do you want start_date_1 stop_date_1 start_date_2 stop_date_2 ------------ ----------- ------------ ----------- 6/1 6/3 6/7 6/9 I'm guessing (but I could be wrong) that your next step after this will be to do some further calculations, which can probably be done more easily with the data in the original, long format. Also, if you want help with the code to translate between what you have now and the layout above, then you need to show the actual layout of the current dataset. Otherwise, people will just guess, and the whole exercise becomes quite inefficient. Don't be put off by this -- I do calculations like this all the time in Stata, and it is very easy to do once you know how. So chances are, this is definitely worth persisting with. -- Phil * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/ * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/ * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

**References**:**st: Handling pharmacy data with multiple entries per subject***From:*"Doernberg, Sarah" <Sarah.Doernberg@ucsf.edu>

**Re: st: Handling pharmacy data with multiple entries per subject***From:*Phil Schumm <pschumm@uchicago.edu>

**RE: st: Handling pharmacy data with multiple entries per subject***From:*"Doernberg, Sarah" <Sarah.Doernberg@ucsf.edu>

- Prev by Date:
**RE: st: Handling pharmacy data with multiple entries per subject** - Next by Date:
**st: Decompose the technical efficiency** - Previous by thread:
**RE: st: Handling pharmacy data with multiple entries per subject** - Next by thread:
**Re: st: Handling pharmacy data with multiple entries per subject** - Index(es):