Stata: Data Analysis and Statistical Software

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: The accuracy of the float data type

From	Nick Cox <[email protected]>
To	"[email protected]" <[email protected]>
Subject	Re: st: The accuracy of the float data type
Date	Fri, 24 Jan 2014 18:09:24 +0000

I wondered that too, but Rochelle said that both variables were
-float-. But if that is not so, then it's likely to be the
explanation.

Note by the way that Stata does not use terminology such as "storage
format". Display format and variable type are, as Nick Winter implies,
quite different notions.

Nick
[email protected]


On 24 January 2014 17:32, Nick Winter <[email protected]> wrote:
> Perhaps the problem comes because the *storage* format of sales and maxsale
> are different.  (This is not the same as the *display* format).
>
> Consider:
>
> clear
> set seed 1234567
> set obs 10
> gen double sales = round(uniform()*100,.001)
> gen year = _n
> egen float maxsale = max(sales), by(year)
> gen equal = sales == maxsale
>
> egen double maxsale2 = max(sales), by(year)
> gen equal2 = sales == maxsale2
>
> gen equal3 = float(sales) == maxsale
>
> list
>
>
>      +--------------------------------------------------------------+
>      |  sales   year   maxsale   equal   maxsale2   equal2   equal3 |
>      |--------------------------------------------------------------|
>   1. |   2.65      1      2.65       0       2.65        1        1 |
>   2. | 17.274      2    17.274       0     17.274        1        1 |
>   3. |  2.923      3     2.923       0      2.923        1        1 |
>   4. | 75.377      4    75.377       0     75.377        1        1 |
>   5. | 65.559      5    65.559       0     65.559        1        1 |
>      |--------------------------------------------------------------|
>   6. | 81.163      6    81.163       0     81.163        1        1 |
>   7. | 17.459      7    17.459       0     17.459        1        1 |
>   8. | 24.531      8    24.531       0     24.531        1        1 |
>   9. | 11.195      9    11.195       0     11.195        1        1 |
>  10. | 75.953     10    75.953       0     75.953        1        1 |
>      +--------------------------------------------------------------+
>
>
> If that's the case, then you need to assure that your sales and maxsale
> variables are in the same storage precision (float, double); OR you need to
> explicitly round the one that is double-precision to float precision when
> you make the comparison, using the float() function.
>
> See -help precision- for more on what's going on here.
>
>
>
> On 1/24/2014 11:55 AM, R Zhang wrote:
>>
>> Thanks to you both, Sergiy and Nick .
>>
>> Nick,
>>
>> 1.are you saying that I should follow Sergiy's advice to change
>> format? If so, given the large number of observations I have , how do
>> I automate the process?
>>
>> 2. if I do not change the format, I listed some observations below to
>> show you that sales and maxsale look the same, however, when I use" l
>> if sales == maxsale" it does not list all of the observations that
>> appear equal.
>>
>>
>> *****************
>>     +--------------------+
>>       |   sales   maxsale1 |
>>       |--------------------|
>>    1. |  25.395     25.395 |
>>    2. |  32.007     32.007 |
>>    3. |  53.798     53.798 |
>>    4. |  12.748     12.748 |
>>    5. |  13.793     13.793 |
>>   ..... omitted to save space
>>
>>   31. | 166.181    166.181 |
>>   32. |  21.927    166.181 |
>>   33. |  26.328    189.897 |
>>   34. |  31.787    189.897 |
>>   35. | 189.897    189.897 |
>>       |--------------------|
>>   36. | 264.582    264.582 |
>>   37. |   33.61    264.582 |
>>   38. | 312.227    312.227 |
>>   39. |  35.413    312.227 |
>>   40. |  406.36     406.36 |
>>       |--------------------|
>>   41. | 444.875    444.875 |
>>
>>
>>   egen maxsale=max(sales), by (gvkey year)
>>
>>   l if sales == maxsale,
>>
>> the first observation that is listed is  444.875    444.875 ,
>>
>> why is that?
>>
>> thanks!
>>
>> On Fri, Jan 24, 2014 at 11:34 AM, Nick Cox <[email protected]> wrote:
>>>
>>> This is very good advice in general, but in this case the maxima are
>>> selected from the original values, so that equality is to be expected
>>> for some observations.
>>> Nick
>>> [email protected]
>>>
>>>
>>> On 24 January 2014 16:31, Sergiy Radyakin <[email protected]> wrote:
>>>>
>>>> Zhang, avoid comparing floating point numbers for equality. Instead
>>>> there is a system variable c(epsfloat) , which you can refer to when
>>>> you need to deal with precision:
>>>>
>>>> clear
>>>> input float sales
>>>> 25.395
>>>> 32.007
>>>> end
>>>>
>>>> list
>>>>
>>>> display c(epsfloat)
>>>>
>>>> list if sales==25.395
>>>> list if abs(sales-25.395)<=10*c(epsfloat)
>>>>
>>>> list if sales==32.007
>>>> list if abs(sales-32.007)<=10*c(epsfloat)
>>>>
>>>>
>>>> Best, Sergiy Radyakin
>>>>
>>>> On Fri, Jan 24, 2014 at 11:23 AM, Maarten Buis <[email protected]>
>>>> wrote:
>>>>>
>>>>> I would do this differently:
>>>>>
>>>>> *------------------ begin example ------------------
>>>>> // get some example data
>>>>> sysuse auto
>>>>>
>>>>> // create a variable denoting missing values
>>>>> gen byte miss = missing(rep78, price)
>>>>>
>>>>> // create our indicator variable
>>>>> bys rep78 miss (price) : gen max = _n == _N if !miss
>>>>>
>>>>> // admire the result
>>>>> list rep78 miss price max in 1/12, sepby(rep78)
>>>>> *------------------- end example -------------------
>>>>> * (For more on examples I sent to the Statalist see:
>>>>> * http://www.maartenbuis.nl/example_faq )
>>>>>
>>>>> Hope this helps,
>>>>> Maarten
>>>>>
>>>>>
>>>>> On Fri, Jan 24, 2014 at 4:53 PM, R Zhang <[email protected]> wrote:
>>>>>>
>>>>>> Dear Statalist,
>>>>>>
>>>>>> my data structure is as follows
>>>>>>
>>>>>> firmID    segmentID   sales year
>>>>>> 1001       1               25.395     1990
>>>>>> 1001       1                32.007     1991
>>>>>>
>>>>>> ............
>>>>>>
>>>>>> a firm can operate in multiple segments as identified by  segmentID .
>>>>>> I wanted to identify the largest segment by sales,so I used
>>>>>>
>>>>>> bysort firmID year : egen maxsale=max(sales)
>>>>>>
>>>>>> then I did
>>>>>> gen PriSIC=0
>>>>>> replace PriSIC=1 if sales=maxsale
>>>>>>
>>>>>> I got
>>>>>> firmID    segmentID   sales year                  maxsale    prisic
>>>>>> 1001       1               25.395     1990            25.395         0
>>>>>> 1001       1                32.007     1991            32.007       0
>>>>>>
>>>>>> I could not figure out why prisic is 0, so I compute the diffderence
>>>>>> (sales-maxsale), it shows a very small negative number , and the data
>>>>>> dictionary shows sales format float %12.0g, and maxsale format float
>>>>>> %9.0g
>>>>>>
>>>>>> what should I do to correct this?
>>>>>>
>>>>>> thanks!!!
>>>>>>
>>>>>> Rochelle
>>>>>> *
>>>>>> *   For searches and help try:
>>>>>> *   http://www.stata.com/help.cgi?search
>>>>>> *   http://www.stata.com/support/faqs/resources/statalist-faq/
>>>>>> *   http://www.ats.ucla.edu/stat/stata/
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> ---------------------------------
>>>>> Maarten L. Buis
>>>>> WZB
>>>>> Reichpietschufer 50
>>>>> 10785 Berlin
>>>>> Germany
>>>>>
>>>>> http://www.maartenbuis.nl
>>>>> ---------------------------------
>>>>> *
>>>>> *   For searches and help try:
>>>>> *   http://www.stata.com/help.cgi?search
>>>>> *   http://www.stata.com/support/faqs/resources/statalist-faq/
>>>>> *   http://www.ats.ucla.edu/stat/stata/
>>>>
>>>> *
>>>> *   For searches and help try:
>>>> *   http://www.stata.com/help.cgi?search
>>>> *   http://www.stata.com/support/faqs/resources/statalist-faq/
>>>> *   http://www.ats.ucla.edu/stat/stata/
>>>
>>> *
>>> *   For searches and help try:
>>> *   http://www.stata.com/help.cgi?search
>>> *   http://www.stata.com/support/faqs/resources/statalist-faq/
>>> *   http://www.ats.ucla.edu/stat/stata/
>>
>> *
>> *   For searches and help try:
>> *   http://www.stata.com/help.cgi?search
>> *   http://www.stata.com/support/faqs/resources/statalist-faq/
>> *   http://www.ats.ucla.edu/stat/stata/
>>
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/faqs/resources/statalist-faq/
> *   http://www.ats.ucla.edu/stat/stata/
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/

Follow-Ups:
- Re: st: The accuracy of the float data type
  - From: R Zhang <[email protected]>

References:
- st: The accuracy of the float data type
  - From: R Zhang <[email protected]>
- Re: st: The accuracy of the float data type
  - From: Maarten Buis <[email protected]>
- Re: st: The accuracy of the float data type
  - From: Sergiy Radyakin <[email protected]>
- Re: st: The accuracy of the float data type
  - From: Nick Cox <[email protected]>
- Re: st: The accuracy of the float data type
  - From: R Zhang <[email protected]>
- Re: st: The accuracy of the float data type
  - From: Nick Winter <[email protected]>

Prev by Date: Re: st: The accuracy of the float data type
Next by Date: Re: st: Reclink: high matching score, but no match
Previous by thread: Re: st: The accuracy of the float data type
Next by thread: Re: st: The accuracy of the float data type
Index(es):
- Date
- Thread