Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
From | Nick Cox <njcoxstata@gmail.com> |
To | statalist@hsphsun2.harvard.edu |
Subject | Re: st: Tolerance for -merge- variable |
Date | Wed, 28 Mar 2012 19:43:22 +0100 |
My understanding is that there is _no_ tolerance. Equal matches, unequal doesn't. What implies otherwise? More specifically, 1. Like you, I wouldn't by preference use a non-integer numeric variable as an identifier, largely because of worries that things like this might happen. 2. This is expectable if one variable is -float- and the other -double- as then x.1 (or whatever) will be stored as different binary approximations. See documentation on precision, passim. 3. If the variables are the same type, please show us (a) minimal datasets and (b) -merge- syntax which shows your problem. But you should first use hexadecimal formats to see if the identifiers really are identical. If not, -merge- is behaving as expected. 4. Otherwise, my best advice is that conversion to string must use an explicit format argument to maximise your chances, e.g. -string(myvar, "%18.1f")-. Nick On Wed, Mar 28, 2012 at 7:26 PM, Rob Ploutz-Snyder <robploutzsnyder@gmail.com> wrote: > I notice that when I have an ID variable stored with 1 decimal place > (ex. id=id+0.1) in two separate data files, the merge command > sometimes fails to equate ID values that are equal within rounding > error. This is particularly problematic if Stata generated one of > these id variables (ex. gen idnew=id+0.1) and Excel or some other > software generated the id variable in the other dataset (including > hand data entry). > > Is there a way to adjust the tolerance that -merge- uses on the ID var > that is in both data sets so that it links properly out to (for > example) 1 or 2 or 3 digits past the decimal?? > > My only solution so far is to generate a string variable from the > numeric ID variables in each dataset and then use the string variable > for the -merge- but it seems like there should be a simpler way to > tweak the tolerance within -merge-. My other solution is to try to > avoid circumstances when the unique ID is a non-integer, but that's > not always an option for me. > * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/