Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: st: The accuracy of the float data type
From
Nick Cox <[email protected]>
To
"[email protected]" <[email protected]>
Subject
Re: st: The accuracy of the float data type
Date
Fri, 24 Jan 2014 18:09:24 +0000
I wondered that too, but Rochelle said that both variables were
-float-. But if that is not so, then it's likely to be the
explanation.
Note by the way that Stata does not use terminology such as "storage
format". Display format and variable type are, as Nick Winter implies,
quite different notions.
Nick
[email protected]
On 24 January 2014 17:32, Nick Winter <[email protected]> wrote:
> Perhaps the problem comes because the *storage* format of sales and maxsale
> are different. (This is not the same as the *display* format).
>
> Consider:
>
> clear
> set seed 1234567
> set obs 10
> gen double sales = round(uniform()*100,.001)
> gen year = _n
> egen float maxsale = max(sales), by(year)
> gen equal = sales == maxsale
>
> egen double maxsale2 = max(sales), by(year)
> gen equal2 = sales == maxsale2
>
> gen equal3 = float(sales) == maxsale
>
> list
>
>
> +--------------------------------------------------------------+
> | sales year maxsale equal maxsale2 equal2 equal3 |
> |--------------------------------------------------------------|
> 1. | 2.65 1 2.65 0 2.65 1 1 |
> 2. | 17.274 2 17.274 0 17.274 1 1 |
> 3. | 2.923 3 2.923 0 2.923 1 1 |
> 4. | 75.377 4 75.377 0 75.377 1 1 |
> 5. | 65.559 5 65.559 0 65.559 1 1 |
> |--------------------------------------------------------------|
> 6. | 81.163 6 81.163 0 81.163 1 1 |
> 7. | 17.459 7 17.459 0 17.459 1 1 |
> 8. | 24.531 8 24.531 0 24.531 1 1 |
> 9. | 11.195 9 11.195 0 11.195 1 1 |
> 10. | 75.953 10 75.953 0 75.953 1 1 |
> +--------------------------------------------------------------+
>
>
> If that's the case, then you need to assure that your sales and maxsale
> variables are in the same storage precision (float, double); OR you need to
> explicitly round the one that is double-precision to float precision when
> you make the comparison, using the float() function.
>
> See -help precision- for more on what's going on here.
>
>
>
> On 1/24/2014 11:55 AM, R Zhang wrote:
>>
>> Thanks to you both, Sergiy and Nick .
>>
>> Nick,
>>
>> 1.are you saying that I should follow Sergiy's advice to change
>> format? If so, given the large number of observations I have , how do
>> I automate the process?
>>
>> 2. if I do not change the format, I listed some observations below to
>> show you that sales and maxsale look the same, however, when I use" l
>> if sales == maxsale" it does not list all of the observations that
>> appear equal.
>>
>>
>> *****************
>> +--------------------+
>> | sales maxsale1 |
>> |--------------------|
>> 1. | 25.395 25.395 |
>> 2. | 32.007 32.007 |
>> 3. | 53.798 53.798 |
>> 4. | 12.748 12.748 |
>> 5. | 13.793 13.793 |
>> ..... omitted to save space
>>
>> 31. | 166.181 166.181 |
>> 32. | 21.927 166.181 |
>> 33. | 26.328 189.897 |
>> 34. | 31.787 189.897 |
>> 35. | 189.897 189.897 |
>> |--------------------|
>> 36. | 264.582 264.582 |
>> 37. | 33.61 264.582 |
>> 38. | 312.227 312.227 |
>> 39. | 35.413 312.227 |
>> 40. | 406.36 406.36 |
>> |--------------------|
>> 41. | 444.875 444.875 |
>>
>>
>> egen maxsale=max(sales), by (gvkey year)
>>
>> l if sales == maxsale,
>>
>> the first observation that is listed is 444.875 444.875 ,
>>
>> why is that?
>>
>> thanks!
>>
>> On Fri, Jan 24, 2014 at 11:34 AM, Nick Cox <[email protected]> wrote:
>>>
>>> This is very good advice in general, but in this case the maxima are
>>> selected from the original values, so that equality is to be expected
>>> for some observations.
>>> Nick
>>> [email protected]
>>>
>>>
>>> On 24 January 2014 16:31, Sergiy Radyakin <[email protected]> wrote:
>>>>
>>>> Zhang, avoid comparing floating point numbers for equality. Instead
>>>> there is a system variable c(epsfloat) , which you can refer to when
>>>> you need to deal with precision:
>>>>
>>>> clear
>>>> input float sales
>>>> 25.395
>>>> 32.007
>>>> end
>>>>
>>>> list
>>>>
>>>> display c(epsfloat)
>>>>
>>>> list if sales==25.395
>>>> list if abs(sales-25.395)<=10*c(epsfloat)
>>>>
>>>> list if sales==32.007
>>>> list if abs(sales-32.007)<=10*c(epsfloat)
>>>>
>>>>
>>>> Best, Sergiy Radyakin
>>>>
>>>> On Fri, Jan 24, 2014 at 11:23 AM, Maarten Buis <[email protected]>
>>>> wrote:
>>>>>
>>>>> I would do this differently:
>>>>>
>>>>> *------------------ begin example ------------------
>>>>> // get some example data
>>>>> sysuse auto
>>>>>
>>>>> // create a variable denoting missing values
>>>>> gen byte miss = missing(rep78, price)
>>>>>
>>>>> // create our indicator variable
>>>>> bys rep78 miss (price) : gen max = _n == _N if !miss
>>>>>
>>>>> // admire the result
>>>>> list rep78 miss price max in 1/12, sepby(rep78)
>>>>> *------------------- end example -------------------
>>>>> * (For more on examples I sent to the Statalist see:
>>>>> * http://www.maartenbuis.nl/example_faq )
>>>>>
>>>>> Hope this helps,
>>>>> Maarten
>>>>>
>>>>>
>>>>> On Fri, Jan 24, 2014 at 4:53 PM, R Zhang <[email protected]> wrote:
>>>>>>
>>>>>> Dear Statalist,
>>>>>>
>>>>>> my data structure is as follows
>>>>>>
>>>>>> firmID segmentID sales year
>>>>>> 1001 1 25.395 1990
>>>>>> 1001 1 32.007 1991
>>>>>>
>>>>>> ............
>>>>>>
>>>>>> a firm can operate in multiple segments as identified by segmentID .
>>>>>> I wanted to identify the largest segment by sales,so I used
>>>>>>
>>>>>> bysort firmID year : egen maxsale=max(sales)
>>>>>>
>>>>>> then I did
>>>>>> gen PriSIC=0
>>>>>> replace PriSIC=1 if sales=maxsale
>>>>>>
>>>>>> I got
>>>>>> firmID segmentID sales year maxsale prisic
>>>>>> 1001 1 25.395 1990 25.395 0
>>>>>> 1001 1 32.007 1991 32.007 0
>>>>>>
>>>>>> I could not figure out why prisic is 0, so I compute the diffderence
>>>>>> (sales-maxsale), it shows a very small negative number , and the data
>>>>>> dictionary shows sales format float %12.0g, and maxsale format float
>>>>>> %9.0g
>>>>>>
>>>>>> what should I do to correct this?
>>>>>>
>>>>>> thanks!!!
>>>>>>
>>>>>> Rochelle
>>>>>> *
>>>>>> * For searches and help try:
>>>>>> * http://www.stata.com/help.cgi?search
>>>>>> * http://www.stata.com/support/faqs/resources/statalist-faq/
>>>>>> * http://www.ats.ucla.edu/stat/stata/
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> ---------------------------------
>>>>> Maarten L. Buis
>>>>> WZB
>>>>> Reichpietschufer 50
>>>>> 10785 Berlin
>>>>> Germany
>>>>>
>>>>> http://www.maartenbuis.nl
>>>>> ---------------------------------
>>>>> *
>>>>> * For searches and help try:
>>>>> * http://www.stata.com/help.cgi?search
>>>>> * http://www.stata.com/support/faqs/resources/statalist-faq/
>>>>> * http://www.ats.ucla.edu/stat/stata/
>>>>
>>>> *
>>>> * For searches and help try:
>>>> * http://www.stata.com/help.cgi?search
>>>> * http://www.stata.com/support/faqs/resources/statalist-faq/
>>>> * http://www.ats.ucla.edu/stat/stata/
>>>
>>> *
>>> * For searches and help try:
>>> * http://www.stata.com/help.cgi?search
>>> * http://www.stata.com/support/faqs/resources/statalist-faq/
>>> * http://www.ats.ucla.edu/stat/stata/
>>
>> *
>> * For searches and help try:
>> * http://www.stata.com/help.cgi?search
>> * http://www.stata.com/support/faqs/resources/statalist-faq/
>> * http://www.ats.ucla.edu/stat/stata/
>>
> *
> * For searches and help try:
> * http://www.stata.com/help.cgi?search
> * http://www.stata.com/support/faqs/resources/statalist-faq/
> * http://www.ats.ucla.edu/stat/stata/
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/faqs/resources/statalist-faq/
* http://www.ats.ucla.edu/stat/stata/