Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: st: Handling pharmacy data with multiple entries per subject

From   "Sarah Edgington" <>
To   <>
Subject   RE: st: Handling pharmacy data with multiple entries per subject
Date   Fri, 10 Jun 2011 16:59:24 -0700

Do you have a variable that serves as an indicator for each hospitalization?
If you do, you could create a variable that is the number of days of the
drug for each record and then sum them across lines. Say you have (or
create) hosp_id which distinguishes hospitalizations within ID.  Then create
abdays by subtracting the start date from the stop date (assuming you've
already put your dates into date format).
bysort ID hosp_id drug_name: egen totab=total(abdays)

This ignores your 30 day restriction and just totals the number of drug days
for each hospitalization.  To incorporate the 30 day restriction you'll have
to do some comparisons between the outcome date and the start and stop dates
for the drugs when creating the abdays variable.  Something like this might

gen start_window=outcome-30
gen abdays=stop_date - start_date if start_date>=start_window
replace abdays=stop_date - start_window if start_date<start_window &

If the outcome ever occurs before the stop_date you'd want to take that into
account too when creating the antibiotic day measure.  But the basic idea is
that as long as you can figure out exactly how many days you want to include
for each record, you can then sum across records to get the total.  

The one potentially trickier bit I see here is if you want to count the
start and stop dates inclusively.  If you want to have 5/19/2001 to
5/20/2001 count as 2 days of taking the drug (rather than, say, treating
drug stop dates as the first day the patient doesn't take a drug) you have
to add one after subtracting start_date from stop_date.  In this case you'd
want to adjust for overlapping spells.  That is, you don't want to count
5/20 in both line 2 and in line 3.  If you sort by start_date and stop_date
you can just compare within a hospitalization to see if
start_date=stop_date[_n-1] and subtract 1 day if it does.  I'm thinking
something like this once you've created the abdays variable:

sort  ID hosp_id drug_name start_date stop_date
bysort ID hosp_id drug_name replace abdays=abdays-1 if
start_date=stop_date[_n-1] & abdays[_n-1] ~= .

The second part of the if statement is just to make sure that any of the
days in the previous line were counted.  If they weren't because the
previous stop_date was out of range, the current line's start date will be
out of range too so you won't be counting the day of overlap anyway.

I'm assuming here that you want to treat all your possible drugs separately.
If not you can just do your calculations sorting on ID and hosp_id.

That should get you the numbers you want without having to rearrange your
data at all.


-----Original Message-----
[] On Behalf Of Doernberg, Sarah
Sent: Friday, June 10, 2011 3:05 PM
Subject: RE: st: Handling pharmacy data with multiple entries per subject

Hi all:

Thank you for your responses. I'm attempting to be more specific about my
question below. My data is currently formatted as follows (with an example

ID     drug_name     start_date      stop_date        outcome
1       ceftriaxone      5/15/2001     5/17/2001       5/31/2001
1       ceftriaxone      5/19/2001     5/20/2001       5/31/2001
1       ceftriaxone      5/20/2001    5/24/2001        5/31/2001
1       ceftriaxone      7/24/2001     7/27/2001       .
This one person had 3 prescriptions for ceftriaxone during a hospitalization
in May, 2001, including one day where the person was not given this drug
(5/15-5/17 and 5/19-5/24). There was another hospitalization in July, 2001,
where another prescription was given. The patient did not experience the
outcome during the second hospitalization. The dataset only contains
information about two different drugs.

My ultimate goal is to figure out the number of days each person received
each drug during a 30-day period from the first day of receipt or before the
date of the outcome (if <30 days from the start of antibiotic) to allow for
a logistic regression with exposure = antibiotic days. In addition, I may
also do a survival analysis using start of antibiotic as the start date,
development of the outcome as failure, and censoring at 30 days in those
without the outcome. 

I hope that helps to clarify things. Thank you for your advice.


[] On Behalf Of Phil Schumm
Sent: Friday, June 10, 2011 2:00 PM
Subject: Re: st: Handling pharmacy data with multiple entries per subject

On Jun 10, 2011, at 2:59 PM, Doernberg, Sarah wrote:
> I have a dataset from our pharmacy with prescriptions for antibiotics 
> in hospitalized patients. Each time a patient was transferred (from 
> the emergency department to the ward or the ward to the ICU, for 
> instance), a new prescription (and thus, a new row) was generated. 
> This is compounded by the fact that some people received intermittent 
> dosing (each start date with it's own row).
> Because this is a very large set of data, I am trying to figure out 
> how to have Stata combine the rows. Ideally, I would like to have one 
> entry per person with consecutive courses of antibiotics represented 
> by start and stop days (for example, someone who received an 
> antibiotic from 6/1-6/3 and 6/7-6/9 would have
> start_date_1 = 6/1, stop_date_1=6/3 and start_date_2=6/7 and 
> stop_date_2=6/9).
> I have tried doing this with the collapse command but the best I can 
> do is to get total days on antibiotic in a given month. Converting 
> from long to wide also is not ideal because consecutive courses are 
> not combined due to the multiple prescriptions based on location.

You'll have to be a bit more specific here to get the help you're asking
for.  For example, why do you want

     start_date_1   stop_date_1   start_date_2   stop_date_2
     ------------   -----------   ------------   -----------
         6/1            6/3           6/7            6/9

I'm guessing (but I could be wrong) that your next step after this will be
to do some further calculations, which can probably be done more easily with
the data in the original, long format.  Also, if you want help with the code
to translate between what you have now and the layout above, then you need
to show the actual layout of the current dataset.  Otherwise, people will
just guess, and the whole exercise becomes quite inefficient.

Don't be put off by this -- I do calculations like this all the time in
Stata, and it is very easy to do once you know how.  So chances are, this is
definitely worth persisting with.

-- Phil

*   For searches and help try:
*   For searches and help try:

*   For searches and help try:

© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index