[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: merge by nearest date

From   Jeph Herrin <[email protected]>
To   [email protected]
Subject   Re: st: merge by nearest date
Date   Tue, 08 Sep 2009 16:19:22 -0400

I did something like this recently by calculating for each
date the week it fell in, and matching on the week. This
isn't exactly what you want, because obviously two dates
can be in different weeks yet fewer than seven days apart,
but worth a thought. I also don't recall if Stata 8.2 has the
-wofd()- function, but if not you can readily calculate a
faux week with

    gen week = int(date/7)

If you truly want to match dates that are <=7 days apart,
you can expand each record by 7, generate a new date equal
to the original plus or minus a few days, match the new dates,
keep the matches, then keep one match per set. Create a unique
identifier first:

    * master *
    gen masterid = _n
    expand 7
    bys masterid : gen matchdate=date1+_n-4
    sort matchdate
    save master

    * using *
    gen usingid = _n
    bys usingid : gen matchdate=date2+_n-4
    sort matchdate
    save using

    u master, clear
    merge matchdate using using
    gen diff=abs(diff2-diff1)
    bys masterid (diff) : keep if _n==1  * keep closest ones *

This probably needs fiddling for Stata 8.2 (or even Stata 11) but
it should produce for each original master record a single
match within 1 week, or _merge!=3 otherwise.


Joseph Wagner wrote:
I am using Stata ver 8.2 and wish to merge two files each with an idcode
and a date.  (I know that more recent versions of Stata can handle this
more capably but funds are short at this time so I make do with what I
have.)  None of the dates in using matches exactly with master.  I get
the following:

. tab _merge

     _merge |      Freq.     Percent        Cum.
          1 |        585        1.82        1.82
          2 |     31,554       98.18      100.00
      Total |     32,139      100.00

This is what I expected but I would like to see how close I can get a
pair (master and using) that are within one week of one another.  If I
calculate a simple difference between master and the next using above
will miss any closer using if it is below the master.  I hope this makes
sense.  Here is a snippet of the data:

idcode      visit    _merge
FEL00057    26 Sep 33    2
FEL00057    30 Sep 33    1
FEL00057    22 Nov 33    2
FEL00057    24 Jan 34    2
FEL00057    28 May 34    2
FEL00057    03 Aug 34    2
FEL00057    23 Sep 34    1
FEL00057    24 Sep 34    2
FEL00057    27 Nov 34    2
FEL00057    28 Jan 35    2
FEL00057    27 Mar 35    1
FEL00057    30 Mar 35    2

If my data is arranged in this fashion, how do I best get the closest
pair of using and master?

*   For searches and help try:

*   For searches and help try:

© Copyright 1996–2024 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index