Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: Merging data sets based on a range of dates


From   Phil Schumm <[email protected]>
To   <[email protected]>
Subject   Re: st: Merging data sets based on a range of dates
Date   Tue, 6 Aug 2013 15:17:24 -0500

On Aug 6, 2013, at 1:14 PM, Joe Canner <[email protected]> wrote:
> From time to time I have a situation like the following:
> 
> Surgery.dta: PatientID SurgeryDate
> 
> Hospital.dta: PatientID AdmissionDate DischargeDate
> 
> There are usually multiple records per PatientID in both data sets and I would like to match any records in Surgery.dta with corresponding records in Hospital.dta where SurgeryDate is between AdmissionDate and DischargeDate.
> 
> Typically, my solutions are inelegant and inefficient, usually involving -reshape- (which is very slow when the number of records per PatientID is large).  Does anyone know of a good method for doing this efficiently in Stata, or at least a user-written program that can easily be invoked to do this?


Sarah's suggestion is definitely the standard way of approaching this type of problem, and typically leads to the most readable code.  Thus, it is to be preferred in most cases.  However, if you're in a situation where the number of observations created by -joinby- is unwieldy (you indicated that the number of records per patient might be large) or if you want to preserve records corresponding to visits without a surgery, you could use something like

    use Hospital
    append using Surgery
    gen date = cond(!mi(AdmissionDate),AdmissionDate,SurgeryDate)
    bys PatientID (date AdmissionDate): replace SurgeryDate = SurgeryDate[_n+1] ///
        if inrange(SurgeryDate[_n+1],AdmissionDate,DischargeDate)
    drop if mi(AdmissionDate)

where I am assuming the following preconditions on the Hospital dataset

    ass !mi(AdmissionDate,DischargeDate) & AdmissionDate<=DischargeDate
    bys PatientID (AdmissionDate): ass AdmissionDate!=AdmissionDate[_n-1]
    bys PatientID (AdmissionDate): ass AdmissionDate>=DischargeDate[_n-1] ///
        if _n>1

To check for missing visit records or multiple surgeries during the same visit, you could use

    bys PatientID (date AdmissionDate): ass !mi(AdmissionDate[_n-1],SurgeryDate[_n-1]) ///
        if mi(AdmissionDate)

before dropping the surgery records.


-- Phil


*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index