Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: The accuracy of the float data type

From   Nick Winter <[email protected]>
To   [email protected]
Subject   Re: st: The accuracy of the float data type
Date   Fri, 24 Jan 2014 12:32:58 -0500

Perhaps the problem comes because the *storage* format of sales and maxsale are different. (This is not the same as the *display* format).


set seed 1234567
set obs 10
gen double sales = round(uniform()*100,.001)
gen year = _n
egen float maxsale = max(sales), by(year)
gen equal = sales == maxsale

egen double maxsale2 = max(sales), by(year)
gen equal2 = sales == maxsale2

gen equal3 = float(sales) == maxsale


     |  sales   year   maxsale   equal   maxsale2   equal2   equal3 |
  1. |   2.65      1      2.65       0       2.65        1        1 |
  2. | 17.274      2    17.274       0     17.274        1        1 |
  3. |  2.923      3     2.923       0      2.923        1        1 |
  4. | 75.377      4    75.377       0     75.377        1        1 |
  5. | 65.559      5    65.559       0     65.559        1        1 |
  6. | 81.163      6    81.163       0     81.163        1        1 |
  7. | 17.459      7    17.459       0     17.459        1        1 |
  8. | 24.531      8    24.531       0     24.531        1        1 |
  9. | 11.195      9    11.195       0     11.195        1        1 |
 10. | 75.953     10    75.953       0     75.953        1        1 |

If that's the case, then you need to assure that your sales and maxsale variables are in the same storage precision (float, double); OR you need to explicitly round the one that is double-precision to float precision when you make the comparison, using the float() function.

See -help precision- for more on what's going on here.

On 1/24/2014 11:55 AM, R Zhang wrote:
Thanks to you both, Sergiy and Nick .


1.are you saying that I should follow Sergiy's advice to change
format? If so, given the large number of observations I have , how do
I automate the process?

2. if I do not change the format, I listed some observations below to
show you that sales and maxsale look the same, however, when I use" l
if sales == maxsale" it does not list all of the observations that
appear equal.

      |   sales   maxsale1 |
   1. |  25.395     25.395 |
   2. |  32.007     32.007 |
   3. |  53.798     53.798 |
   4. |  12.748     12.748 |
   5. |  13.793     13.793 |
  ..... omitted to save space

  31. | 166.181    166.181 |
  32. |  21.927    166.181 |
  33. |  26.328    189.897 |
  34. |  31.787    189.897 |
  35. | 189.897    189.897 |
  36. | 264.582    264.582 |
  37. |   33.61    264.582 |
  38. | 312.227    312.227 |
  39. |  35.413    312.227 |
  40. |  406.36     406.36 |
  41. | 444.875    444.875 |

  egen maxsale=max(sales), by (gvkey year)

  l if sales == maxsale,

the first observation that is listed is  444.875    444.875 ,

why is that?


On Fri, Jan 24, 2014 at 11:34 AM, Nick Cox <[email protected]> wrote:
This is very good advice in general, but in this case the maxima are
selected from the original values, so that equality is to be expected
for some observations.
[email protected]

On 24 January 2014 16:31, Sergiy Radyakin <[email protected]> wrote:
Zhang, avoid comparing floating point numbers for equality. Instead
there is a system variable c(epsfloat) , which you can refer to when
you need to deal with precision:

input float sales


display c(epsfloat)

list if sales==25.395
list if abs(sales-25.395)<=10*c(epsfloat)

list if sales==32.007
list if abs(sales-32.007)<=10*c(epsfloat)

Best, Sergiy Radyakin

On Fri, Jan 24, 2014 at 11:23 AM, Maarten Buis <[email protected]> wrote:
I would do this differently:

*------------------ begin example ------------------
// get some example data
sysuse auto

// create a variable denoting missing values
gen byte miss = missing(rep78, price)

// create our indicator variable
bys rep78 miss (price) : gen max = _n == _N if !miss

// admire the result
list rep78 miss price max in 1/12, sepby(rep78)
*------------------- end example -------------------
* (For more on examples I sent to the Statalist see:
* )

Hope this helps,

On Fri, Jan 24, 2014 at 4:53 PM, R Zhang <[email protected]> wrote:
Dear Statalist,

my data structure is as follows

firmID    segmentID   sales year
1001       1               25.395     1990
1001       1                32.007     1991


a firm can operate in multiple segments as identified by  segmentID .
I wanted to identify the largest segment by sales,so I used

bysort firmID year : egen maxsale=max(sales)

then I did
gen PriSIC=0
replace PriSIC=1 if sales=maxsale

I got
firmID    segmentID   sales year                  maxsale    prisic
1001       1               25.395     1990            25.395         0
1001       1                32.007     1991            32.007       0

I could not figure out why prisic is 0, so I compute the diffderence
(sales-maxsale), it shows a very small negative number , and the data
dictionary shows sales format float %12.0g, and maxsale format float

what should I do to correct this?


*   For searches and help try:

Maarten L. Buis
Reichpietschufer 50
10785 Berlin
*   For searches and help try:
*   For searches and help try:
*   For searches and help try:
*   For searches and help try:

*   For searches and help try:

© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index