Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
From | Rob Ploutz-Snyder <robploutzsnyder@gmail.com> |
To | statalist@hsphsun2.harvard.edu |
Subject | Re: st: Tolerance for -merge- variable |
Date | Thu, 29 Mar 2012 10:37:22 -0500 |
Thank you Nick for your prompt reply to my post. You have clarified my problem exactly--and more clearly than I. The precision problem becomes even more troublesome when different software play the game. In my case, I rec'd one data set with ID's that were generated in Excel... I've no idea what the precision is there but I know that it doesn't align nicely with the other dataset that generated those (decimal) IDs in Stata's. Alas--I guess I am stuck with converting ID's to String for the merge. The good news is that I was especially avoiding this solution because I had assumed that I couldn't then use a string ID var as an identifier in Stata's -xtmixed- or other xt routines, so I had to back-convert into a numberic ID. I seem to be able to use a String variable for that too so I suppose Stata's -merge- behavior is alright in the end. ...I stubbornly admit that I still wish it had a tolerance option that we could tweak so that, with our instruction, it would treat ID's within ?? decimals as equal. Again--thank you! Rob On Wed, Mar 28, 2012 at 1:43 PM, Nick Cox <njcoxstata@gmail.com> wrote: > My understanding is that there is _no_ tolerance. Equal matches, > unequal doesn't. What implies otherwise? > > More specifically, > > 1. Like you, I wouldn't by preference use a non-integer numeric > variable as an identifier, largely because of worries that things like > this might happen. > > 2. This is expectable if one variable is -float- and the other > -double- as then x.1 (or whatever) will be stored as different binary > approximations. See documentation on precision, passim. > > 3. If the variables are the same type, please show us (a) minimal > datasets and (b) -merge- syntax which shows your problem. But you > should first use hexadecimal formats to see if the identifiers really > are identical. If not, -merge- is behaving as expected. > > 4. Otherwise, my best advice is that conversion to string must use an > explicit format argument to maximise your chances, e.g. -string(myvar, > "%18.1f")-. > > Nick > > On Wed, Mar 28, 2012 at 7:26 PM, Rob Ploutz-Snyder > <robploutzsnyder@gmail.com> wrote: > >> I notice that when I have an ID variable stored with 1 decimal place >> (ex. id=id+0.1) in two separate data files, the merge command >> sometimes fails to equate ID values that are equal within rounding >> error. This is particularly problematic if Stata generated one of >> these id variables (ex. gen idnew=id+0.1) and Excel or some other >> software generated the id variable in the other dataset (including >> hand data entry). >> >> Is there a way to adjust the tolerance that -merge- uses on the ID var >> that is in both data sets so that it links properly out to (for >> example) 1 or 2 or 3 digits past the decimal?? >> >> My only solution so far is to generate a string variable from the >> numeric ID variables in each dataset and then use the string variable >> for the -merge- but it seems like there should be a simpler way to >> tweak the tolerance within -merge-. My other solution is to try to >> avoid circumstances when the unique ID is a non-integer, but that's >> not always an option for me. >> > > * > * For searches and help try: > * http://www.stata.com/help.cgi?search > * http://www.stata.com/support/statalist/faq > * http://www.ats.ucla.edu/stat/stata/ * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/