Bookmark and Share

Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down on April 23, and its replacement, statalist.org is already up and running.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: Tolerance for -merge- variable


From   Nick Cox <njcoxstata@gmail.com>
To   statalist@hsphsun2.harvard.edu
Subject   Re: st: Tolerance for -merge- variable
Date   Wed, 28 Mar 2012 19:43:22 +0100

My understanding is that there is _no_ tolerance. Equal matches,
unequal doesn't. What implies otherwise?

More specifically,

1. Like you, I wouldn't by preference use a non-integer numeric
variable as an identifier, largely because of worries that things like
this might happen.

2. This is expectable if one variable is -float- and the other
-double- as then x.1 (or whatever) will be stored as different binary
approximations. See documentation on precision, passim.

3. If the variables are the same type, please show us (a) minimal
datasets  and (b) -merge- syntax which shows your problem. But you
should first use hexadecimal formats to see if the identifiers really
are identical. If not, -merge- is behaving as expected.

4. Otherwise, my best advice is that conversion to string must use an
explicit format argument to maximise your chances, e.g. -string(myvar,
"%18.1f")-.

Nick

On Wed, Mar 28, 2012 at 7:26 PM, Rob Ploutz-Snyder
<robploutzsnyder@gmail.com> wrote:

> I notice that when I have an ID variable stored with 1 decimal place
> (ex. id=id+0.1) in two separate data files, the merge command
> sometimes fails to equate ID values that are equal within rounding
> error.  This is particularly problematic if Stata generated one of
> these id variables (ex. gen idnew=id+0.1) and Excel or some other
> software generated the id variable in the other dataset (including
> hand data entry).
>
> Is there a way to adjust the tolerance that -merge- uses on the ID var
> that is in both data sets so that it links properly out to (for
> example) 1 or 2 or 3 digits past the decimal??
>
> My only solution so far is to generate a string variable from the
> numeric ID variables in each dataset and then use the string variable
> for the -merge- but it seems like there should be a simpler way to
> tweak the tolerance within -merge-.  My other solution is to try to
> avoid circumstances when the unique ID is a non-integer, but that's
> not always an option for me.
>

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   Site index