Stata: Data Analysis and Statistical Software

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: Handling pharmacy data with multiple entries per subject

From	Phil Schumm <[email protected]>
To	[email protected]
Subject	Re: st: Handling pharmacy data with multiple entries per subject
Date	Mon, 13 Jun 2011 11:15:25 -0500

On Jun 10, 2011, at 5:05 PM, Doernberg, Sarah wrote:

Thank you for your responses. I'm attempting to be more specificabout my question below. My data is currently formatted as follows(with an example entry):
ID     drug_name     start_date      stop_date        outcome
1       ceftriaxone      5/15/2001     5/17/2001       5/31/2001
1       ceftriaxone      5/19/2001     5/20/2001       5/31/2001
1       ceftriaxone      5/20/2001    5/24/2001        5/31/2001
1       ceftriaxone      7/24/2001     7/27/2001       .
This one person had 3 prescriptions for ceftriaxone during ahospitalization in May, 2001, including one day where the person wasnot given this drug (5/15-5/17 and 5/19-5/24). There was anotherhospitalization in July, 2001, where another prescription was given.The patient did not experience the outcome during the secondhospitalization. The dataset only contains information about twodifferent drugs.
My ultimate goal is to figure out the number of days each personreceived each drug during a 30-day period from the first day ofreceipt or before the date of the outcome (if <30 days from thestart of antibiotic) to allow for a logistic regression withexposure = antibiotic days. In addition, I may also do a survivalanalysis using start of antibiotic as the start date, development ofthe outcome as failure, and censoring at 30 days in those withoutthe outcome.

The first thing to do with a dataset like this is to run some checksto get a sense of what you're dealing with. For example (assumingthat start_date, stop_date and outcome are all Stata-format dates):



    ass start_date <= stop_date
    isid id drug_name start_date

bys id drug_name (start_date): ass stop_date[_n-1] <= start_dateif 1<_n

Taken together, these will verify that your start/stop dates areproperly ordered, and that the periods of administration for a givendrug do not overlap. There may be other things that you'll want tocheck too. Whether you "fix" any problems that show up (or ask for anupdated data file) first or attempt to code around them will depend onthe situation.

Often with problems like this, generating summaries is easier (or atleast more readable) if you transform the data so that you have onerecord for each day. You may not be able to do this if you have areally big dataset, but don't be afraid to use your RAM if you've gotit. For example, the following code will expand out your periods sothat you have one record for each day of drug administration(separately by drug):



    expand stop_date - start_date + 1
    bys id drug_name start_date: gen date = start_date if _n==1

bys id drug_name start_date (date): replace date = date[_n-1] + 1if 1<_n

Now, you probably want a record corresponding to the date of theoutcome (if you don't already have one), which you can generate (for asingle drug, using the data in your example) with:



    gen fail = (outcome == date)

bys id outcome (fail): gen byte n = (_n==1 & !fail[_N] & !mi(outcome)) + 1

    expand n, gen(added)
    replace date = outcome if added
    replace drug_name = "" if added
    replace fail = 1 if added

Finally, you may need to clean up any duplicates that were generatedabove:



    keep id drug_name date fail
    duplicates drop
    isid id date

From the resulting dataset, you can easily get what you need for asurvival analysis (using cumulative drug exposure as a time-varyingcovariate), or generate the 30-day periods you describe. You'llprobably need information on the dates of hospital admission anddischarge to do this.

Note that for simplicity, I have essentially ignored the fact that youhave multiple drugs here (except for the obvious stratification bydrug_name in the first two code blocks). Ultimately, you'll probablywant to have one record per date on which *any* drug was administered(plus additional records, if necessary, for dates of hospitaladmission and discharge, and date of outcome), with a separate 0/1column for each drug indicating whether that drug was administered onthat day. One way to achieve this would be to do what I have shownabove separately for each of your two drugs, and then merge theresults together (by id and date). The outcome data (as you haveshown them) are in an odd format, and how you handle them ultimatelywill depend on what they look like for the other drug (i.e., are thesame outcome dates repeated)? It may well be easiest just to peelthose off and deal with them separately (like the two drugs), and thenmerge the outcome dates back on at the end.


Hopefully this'll give you some ideas/strategies that you can use.


-- Phil

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

References:
- st: Handling pharmacy data with multiple entries per subject
  - From: "Doernberg, Sarah" <[email protected]>
- Re: st: Handling pharmacy data with multiple entries per subject
  - From: Phil Schumm <[email protected]>
- RE: st: Handling pharmacy data with multiple entries per subject
  - From: "Doernberg, Sarah" <[email protected]>

Prev by Date: st: panel unit roots and cointegration
Next by Date: st: Japanese Candlestick Charts
Previous by thread: RE: st: Handling pharmacy data with multiple entries per subject
Next by thread: st: Assistance on variable selection problem
Index(es):
- Date
- Thread