Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

st: encode results in false match - merge/joinby


From   joe j <joe.stata@gmail.com>
To   statalist@hsphsun2.harvard.edu
Subject   st: encode results in false match - merge/joinby
Date   Thu, 10 Feb 2011 22:07:09 +0100

I just wanted to highlight something I encountered while merging two
data sets with encoded merge variables . The two tables in reality are
a perfect non-match. This is also the case when I use the matching
variable 'code' in the string format. But if I encode them and
generate a variable 'code1' and use that for merging there is a
perfect match. (Now, I don't remember why I encoded this
variable-there must have been a reason but that was definitely not
aimed at merge.)

Below is an example with two files being joined with string variable
'code' and encoded variable 'code1'--the latter results in a false
perfect match. I wonder if this strange behavior of encoded variables
is limited only to 'join' or could it be an issue also in other
contexts (?). Thanks for any pointers.

clear
input id str5 code
1 "123J5"
2 "68741"
3 "297J5"
4 "14856"
5 "AB234"
6 "25K45"
7 "12535"
end
encode code, gen(code1)
sort code1
save file1.dta, replace

clear
input id str5 code
1 "243J5"
2 "68348"
3 "479H5"
4 "467G5"
5 "23TUB"
6 "TU501"
7 "32LK8"
end
encode code, gen(code1)

joinby code1 using file1.dta, unmatched(both) /*perfect match*/
*joinby code using file1.dta, unmatched(both) /*perfect non-match*

ta _m
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index