I did something like this recently by calculating for each
date the week it fell in, and matching on the week. This
isn't exactly what you want, because obviously two dates
can be in different weeks yet fewer than seven days apart,
but worth a thought. I also don't recall if Stata 8.2 has the
-wofd()- function, but if not you can readily calculate a
faux week with
gen week = int(date/7)
If you truly want to match dates that are <=7 days apart,
you can expand each record by 7, generate a new date equal
to the original plus or minus a few days, match the new dates,
keep the matches, then keep one match per set. Create a unique
identifier first:
* master *
gen masterid = _n
expand 7
bys masterid : gen matchdate=date1+_n-4
sort matchdate
save master
* using *
gen usingid = _n
bys usingid : gen matchdate=date2+_n-4
sort matchdate
save using
u master, clear
merge matchdate using using
gen diff=abs(diff2-diff1)
bys masterid (diff) : keep if _n==1 * keep closest ones *
This probably needs fiddling for Stata 8.2 (or even Stata 11) but
it should produce for each original master record a single
match within 1 week, or _merge!=3 otherwise.
hth,
Jeph
Joseph Wagner wrote:
I am using Stata ver 8.2 and wish to merge two files each with an idcode
and a date. (I know that more recent versions of Stata can handle this
more capably but funds are short at this time so I make do with what I
have.) None of the dates in using matches exactly with master. I get
the following:
. tab _merge
_merge | Freq. Percent Cum.
------------+-----------------------------------
1 | 585 1.82 1.82
2 | 31,554 98.18 100.00
------------+-----------------------------------
Total | 32,139 100.00
This is what I expected but I would like to see how close I can get a
pair (master and using) that are within one week of one another. If I
calculate a simple difference between master and the next using above
will miss any closer using if it is below the master. I hope this makes
sense. Here is a snippet of the data:
idcode visit _merge
FEL00057 26 Sep 33 2
FEL00057 30 Sep 33 1
FEL00057 22 Nov 33 2
FEL00057 24 Jan 34 2
FEL00057 28 May 34 2
FEL00057 03 Aug 34 2
FEL00057 23 Sep 34 1
FEL00057 24 Sep 34 2
FEL00057 27 Nov 34 2
FEL00057 28 Jan 35 2
FEL00057 27 Mar 35 1
FEL00057 30 Mar 35 2
If my data is arranged in this fashion, how do I best get the closest
pair of using and master?
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/