Re: st: The accuracy of the float data type

Fri, 24 Jan 2014 15:13:23 -0500

sales was double %12.0g, maxsale was float %9.0g. My apology. On Fri, Jan 24, 2014 at 1:09 PM, Nick Cox <[email protected]> wrote: > I wondered that too, but Rochelle said that both variables were > -float-. But if that is not so, then it's likely to be the > explanation. > > Note by the way that Stata does not use terminology such as "storage > format". Display format and variable type are, as Nick Winter implies, > quite different notions. > > Nick > [email protected] > > > On 24 January 2014 17:32, Nick Winter <[email protected]> wrote: >> Perhaps the problem comes because the *storage* format of sales and maxsale >> are different. (This is not the same as the *display* format). >> >> Consider: >> >> clear >> set seed 1234567 >> set obs 10 >> gen double sales = round(uniform()*100,.001) >> gen year = _n >> egen float maxsale = max(sales), by(year) >> gen equal = sales == maxsale >> >> egen double maxsale2 = max(sales), by(year) >> gen equal2 = sales == maxsale2 >> >> gen equal3 = float(sales) == maxsale >> >> list >> >> >> +--------------------------------------------------------------+ >> | sales year maxsale equal maxsale2 equal2 equal3 | >> |--------------------------------------------------------------| >> 1. | 2.65 1 2.65 0 2.65 1 1 | >> 2. | 17.274 2 17.274 0 17.274 1 1 | >> 3. | 2.923 3 2.923 0 2.923 1 1 | >> 4. | 75.377 4 75.377 0 75.377 1 1 | >> 5. | 65.559 5 65.559 0 65.559 1 1 | >> |--------------------------------------------------------------| >> 6. | 81.163 6 81.163 0 81.163 1 1 | >> 7. | 17.459 7 17.459 0 17.459 1 1 | >> 8. | 24.531 8 24.531 0 24.531 1 1 | >> 9. | 11.195 9 11.195 0 11.195 1 1 | >> 10. | 75.953 10 75.953 0 75.953 1 1 | >> +--------------------------------------------------------------+ >> >> >> If that's the case, then you need to assure that your sales and maxsale >> variables are in the same storage precision (float, double); OR you need to >> explicitly round the one that is double-precision to float precision when >> you make the comparison, using the float() function. >> >> See -help precision- for more on what's going on here. >> >> >> >> On 1/24/2014 11:55 AM, R Zhang wrote: >>> >>> Thanks to you both, Sergiy and Nick . >>> >>> Nick, >>> >>> 1.are you saying that I should follow Sergiy's advice to change >>> format? If so, given the large number of observations I have , how do >>> I automate the process? >>> >>> 2. if I do not change the format, I listed some observations below to >>> show you that sales and maxsale look the same, however, when I use" l >>> if sales == maxsale" it does not list all of the observations that >>> appear equal. >>> >>> >>> ***************** >>> +--------------------+ >>> | sales maxsale1 | >>> |--------------------| >>> 1. | 25.395 25.395 | >>> 2. | 32.007 32.007 | >>> 3. | 53.798 53.798 | >>> 4. | 12.748 12.748 | >>> 5. | 13.793 13.793 | >>> ..... omitted to save space >>> >>> 31. | 166.181 166.181 | >>> 32. | 21.927 166.181 | >>> 33. | 26.328 189.897 | >>> 34. | 31.787 189.897 | >>> 35. | 189.897 189.897 | >>> |--------------------| >>> 36. | 264.582 264.582 | >>> 37. | 33.61 264.582 | >>> 38. | 312.227 312.227 | >>> 39. | 35.413 312.227 | >>> 40. | 406.36 406.36 | >>> |--------------------| >>> 41. | 444.875 444.875 | >>> >>> >>> egen maxsale=max(sales), by (gvkey year) >>> >>> l if sales == maxsale, >>> >>> the first observation that is listed is 444.875 444.875 , >>> >>> why is that? >>> >>> thanks! >>> >>> On Fri, Jan 24, 2014 at 11:34 AM, Nick Cox <[email protected]> wrote: >>>> >>>> This is very good advice in general, but in this case the maxima are >>>> selected from the original values, so that equality is to be expected >>>> for some observations. >>>> Nick >>>> [email protected] >>>> >>>> >>>> On 24 January 2014 16:31, Sergiy Radyakin <[email protected]> wrote: >>>>> >>>>> Zhang, avoid comparing floating point numbers for equality. Instead >>>>> there is a system variable c(epsfloat) , which you can refer to when >>>>> you need to deal with precision: >>>>> >>>>> clear >>>>> input float sales >>>>> 25.395 >>>>> 32.007 >>>>> end >>>>> >>>>> list >>>>> >>>>> display c(epsfloat) >>>>> >>>>> list if sales==25.395 >>>>> list if abs(sales-25.395)<=10*c(epsfloat) >>>>> >>>>> list if sales==32.007 >>>>> list if abs(sales-32.007)<=10*c(epsfloat) >>>>> >>>>> >>>>> Best, Sergiy Radyakin >>>>> >>>>> On Fri, Jan 24, 2014 at 11:23 AM, Maarten Buis <[email protected]> >>>>> wrote: >>>>>> >>>>>> I would do this differently: >>>>>> >>>>>> *------------------ begin example ------------------ >>>>>> // get some example data >>>>>> sysuse auto >>>>>> >>>>>> // create a variable denoting missing values >>>>>> gen byte miss = missing(rep78, price) >>>>>> >>>>>> // create our indicator variable >>>>>> bys rep78 miss (price) : gen max = _n == _N if !miss >>>>>> >>>>>> // admire the result >>>>>> list rep78 miss price max in 1/12, sepby(rep78) >>>>>> *------------------- end example ------------------- >>>>>> * (For more on examples I sent to the Statalist see: >>>>>> * http://www.maartenbuis.nl/example_faq ) >>>>>> >>>>>> Hope this helps, >>>>>> Maarten >>>>>> >>>>>> >>>>>> On Fri, Jan 24, 2014 at 4:53 PM, R Zhang <[email protected]> wrote: >>>>>>> >>>>>>> Dear Statalist, >>>>>>> >>>>>>> my data structure is as follows >>>>>>> >>>>>>> firmID segmentID sales year >>>>>>> 1001 1 25.395 1990 >>>>>>> 1001 1 32.007 1991 >>>>>>> >>>>>>> ............ >>>>>>> >>>>>>> a firm can operate in multiple segments as identified by segmentID . >>>>>>> I wanted to identify the largest segment by sales,so I used >>>>>>> >>>>>>> bysort firmID year : egen maxsale=max(sales) >>>>>>> >>>>>>> then I did >>>>>>> gen PriSIC=0 >>>>>>> replace PriSIC=1 if sales=maxsale >>>>>>> >>>>>>> I got >>>>>>> firmID segmentID sales year maxsale prisic >>>>>>> 1001 1 25.395 1990 25.395 0 >>>>>>> 1001 1 32.007 1991 32.007 0 >>>>>>> >>>>>>> I could not figure out why prisic is 0, so I compute the diffderence >>>>>>> (sales-maxsale), it shows a very small negative number , and the data >>>>>>> dictionary shows sales format float %12.0g, and maxsale format float >>>>>>> %9.0g >>>>>>> >>>>>>> what should I do to correct this? >>>>>>> >>>>>>> thanks!!! >>>>>>> >>>>>>> Rochelle Buis >>>>>> WZB >>>>>> Reichpietschufer 50 >>>>>> 10785 Berlin >>>>>> Germany >>>>>> >>>>>> http://www.maartenbuis.nl

