Statalist The Stata Listserver


[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

st: RE: merge datasets using "closest" match


From   "Scott Merryman" <[email protected]>
To   <[email protected]>
Subject   st: RE: merge datasets using "closest" match
Date   Thu, 29 Jun 2006 20:25:57 -0500

I believe the example below, which merges only the first character of the id
variable works without two successive merges.

Scott


clear
tempfile  tmp1 

input str2 id
1A 
1B 
2 
3
4A
4B
5A
5C
5B
6A
6B
7
end
sort id
gen id_num = substr(id, 1,1)
sort id_num
save `tmp1'

clear
input str2 id2 
1 
2 
3 
4B 
4A 
5C
5A
5B
6
7A
7B
end
sort id2
gen id_num = substr(id2, 1,1)
sort id_num 

merge  id_num using `tmp1'
drop _m id_num 
sort id2 id

l

> -----Original Message-----
> From: [email protected] [mailto:owner-
> [email protected]] On Behalf Of Radu Ban
> Sent: Thursday, June 29, 2006 4:22 PM
> To: [email protected]
> Subject: st: merge datasets using "closest" match
> 
> dear listers,
> 
> i have two datasets and i want to match them on a key variable. the
> problem is that the key variable differs slightly between the two
> datasets. i'll explain what this means.
> 
> in dataset 1 the key may look like this
> 1
> 2
> 3
> 4A
> 4B
> 5A
> 5B
> 5C
> 6
> ...
> 
> in dataset 2 the key may look like this
> 1A
> 1B
> 2
> 3
> 4A
> 4B
> 5A
> 5B
> 5C
> 6A
> 6B
> ...
> 
> the reason for these discrepancies is that, the unit of of observation
> is a plot (of land) and some plots have split (for example 1 has split
> into 1A and 1B, 5 has split into 5A and 5B, etc) between the two
> periods of time. i want to merge the two datasets keeping in mind
> these potential splits, so that 1A and 1B are both matched to 1.
> 
> i figured a long way to do this: generating a "de-lettered" identifier
> in dataset two. then doing two succesive merges. sth like:
> 
> merge key using dataset1
> drop if _m == 2
> drop _m
> 
> rename key letteredkey
> rename deletteredkey key
> sort key
> merge key using dataset1, update
> drop if _m == 2
> 
> is there a shorter, perhaps more clever way to do this? i found a
> user-written ado -nearmrg-, which does exactly what i want but only
> for numeric keys.
> 
> thanks a lot for this,
> radu ban



*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2024 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index