Statalist The Stata Listserver


[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

st: merge datasets using "closest" match


From   "Radu Ban" <[email protected]>
To   [email protected]
Subject   st: merge datasets using "closest" match
Date   Thu, 29 Jun 2006 17:21:58 -0400

dear listers,

i have two datasets and i want to match them on a key variable. the
problem is that the key variable differs slightly between the two
datasets. i'll explain what this means.

in dataset 1 the key may look like this
1
2
3
4A
4B
5A
5B
5C
6
...

in dataset 2 the key may look like this
1A
1B
2
3
4A
4B
5A
5B
5C
6A
6B
...

the reason for these discrepancies is that, the unit of of observation
is a plot (of land) and some plots have split (for example 1 has split
into 1A and 1B, 5 has split into 5A and 5B, etc) between the two
periods of time. i want to merge the two datasets keeping in mind
these potential splits, so that 1A and 1B are both matched to 1.

i figured a long way to do this: generating a "de-lettered" identifier
in dataset two. then doing two succesive merges. sth like:

merge key using dataset1
drop if _m == 2
drop _m

rename key letteredkey
rename deletteredkey key
sort key
merge key using dataset1, update
drop if _m == 2

is there a shorter, perhaps more clever way to do this? i found a
user-written ado -nearmrg-, which does exactly what i want but only
for numeric keys.

thanks a lot for this,
radu ban
*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2024 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index