[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: merge by nearest date

From   "Michael I. Lichter" <[email protected]>
To   [email protected]
Subject   Re: st: merge by nearest date
Date   Tue, 08 Sep 2009 15:36:49 -0400


As far as I know there are no facilities in Stata for doing fuzzy matching. One possibility is to rename the visit variable in file1 (master) to visit1 and in file2 (using) to visit2, -joinby idcode-, calculate diff = abs(visit1 - visit2) and drop all records where diff > 30 or some other threshold. Do -duplicates report idcode visit2- afterwards to see if you've got the same file2 record attached to more than one file1 record (assuming that there are no duplicate visit dates in file2) and inspect these to see what you can do to fix them (e.g., by changing the date threshold or by adding a visit3 = visit2 that you alter in order to encourage/discourage specific matches).


Joseph Wagner wrote:
I am using Stata ver 8.2 and wish to merge two files each with an idcode
and a date.  (I know that more recent versions of Stata can handle this
more capably but funds are short at this time so I make do with what I
have.)  None of the dates in using matches exactly with master.  I get
the following:

. tab _merge

     _merge |      Freq.     Percent        Cum.
          1 |        585        1.82        1.82
          2 |     31,554       98.18      100.00
      Total |     32,139      100.00

This is what I expected but I would like to see how close I can get a
pair (master and using) that are within one week of one another.  If I
calculate a simple difference between master and the next using above
will miss any closer using if it is below the master.  I hope this makes
sense.  Here is a snippet of the data:

idcode      visit    _merge
FEL00057    26 Sep 33    2
FEL00057    30 Sep 33    1
FEL00057    22 Nov 33    2
FEL00057    24 Jan 34    2
FEL00057    28 May 34    2
FEL00057    03 Aug 34    2
FEL00057    23 Sep 34    1
FEL00057    24 Sep 34    2
FEL00057    27 Nov 34    2
FEL00057    28 Jan 35    2
FEL00057    27 Mar 35    1
FEL00057    30 Mar 35    2

If my data is arranged in this fashion, how do I best get the closest
pair of using and master?

*   For searches and help try:

Michael I. Lichter, Ph.D. <[email protected]>
Research Assistant Professor & NRSA Fellow
UB Department of Family Medicine / Primary Care Research Institute
UB Clinical Center, 462 Grider Street, Buffalo, NY 14215
Office: CC 126 / Phone: 716-898-4751 / FAX: 716-898-3536

*   For searches and help try:

© Copyright 1996–2024 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index