Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: Tolerance for -merge- variable

From   Rob Ploutz-Snyder <[email protected]>
To   [email protected]
Subject   Re: st: Tolerance for -merge- variable
Date   Thu, 29 Mar 2012 10:37:22 -0500

Thank you Nick for your prompt reply to my post.  You have clarified
my problem exactly--and more clearly than I.

The precision problem  becomes even more troublesome when different
software play the game. In my case, I rec'd one data set with ID's
that were generated in Excel... I've no idea what the precision is
there but  I know that it doesn't align nicely with the other dataset
that generated those (decimal) IDs in Stata's.

Alas--I guess I am stuck with converting ID's to String for the merge.
 The good news is that I was especially avoiding this solution because
I had assumed that I couldn't then use a string ID var as an
identifier in Stata's -xtmixed- or other xt routines, so I had to
back-convert into a numberic ID.

I seem to be able to use a String variable for that too so I suppose
Stata's -merge- behavior is alright in the end.

...I stubbornly admit that I still wish it had a tolerance option
that we could tweak so that, with our instruction, it would treat ID's
within ?? decimals as equal.

Again--thank you!

On Wed, Mar 28, 2012 at 1:43 PM, Nick Cox <[email protected]> wrote:
> My understanding is that there is _no_ tolerance. Equal matches,
> unequal doesn't. What implies otherwise?
> More specifically,
> 1. Like you, I wouldn't by preference use a non-integer numeric
> variable as an identifier, largely because of worries that things like
> this might happen.
> 2. This is expectable if one variable is -float- and the other
> -double- as then x.1 (or whatever) will be stored as different binary
> approximations. See documentation on precision, passim.
> 3. If the variables are the same type, please show us (a) minimal
> datasets  and (b) -merge- syntax which shows your problem. But you
> should first use hexadecimal formats to see if the identifiers really
> are identical. If not, -merge- is behaving as expected.
> 4. Otherwise, my best advice is that conversion to string must use an
> explicit format argument to maximise your chances, e.g. -string(myvar,
> "%18.1f")-.
> Nick
> On Wed, Mar 28, 2012 at 7:26 PM, Rob Ploutz-Snyder
> <[email protected]> wrote:
>> I notice that when I have an ID variable stored with 1 decimal place
>> (ex. id=id+0.1) in two separate data files, the merge command
>> sometimes fails to equate ID values that are equal within rounding
>> error.  This is particularly problematic if Stata generated one of
>> these id variables (ex. gen idnew=id+0.1) and Excel or some other
>> software generated the id variable in the other dataset (including
>> hand data entry).
>> Is there a way to adjust the tolerance that -merge- uses on the ID var
>> that is in both data sets so that it links properly out to (for
>> example) 1 or 2 or 3 digits past the decimal??
>> My only solution so far is to generate a string variable from the
>> numeric ID variables in each dataset and then use the string variable
>> for the -merge- but it seems like there should be a simpler way to
>> tweak the tolerance within -merge-.  My other solution is to try to
>> avoid circumstances when the unique ID is a non-integer, but that's
>> not always an option for me.
> *
> *   For searches and help try:
> *
> *
> *

*   For searches and help try:

© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index