Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: st: The accuracy of the float data type
From
Nick Winter <[email protected]>
To
[email protected]
Subject
Re: st: The accuracy of the float data type
Date
Fri, 24 Jan 2014 12:32:58 -0500
Perhaps the problem comes because the *storage* format of sales and
maxsale are different. (This is not the same as the *display* format).
Consider:
clear
set seed 1234567
set obs 10
gen double sales = round(uniform()*100,.001)
gen year = _n
egen float maxsale = max(sales), by(year)
gen equal = sales == maxsale
egen double maxsale2 = max(sales), by(year)
gen equal2 = sales == maxsale2
gen equal3 = float(sales) == maxsale
list
+--------------------------------------------------------------+
| sales year maxsale equal maxsale2 equal2 equal3 |
|--------------------------------------------------------------|
1. | 2.65 1 2.65 0 2.65 1 1 |
2. | 17.274 2 17.274 0 17.274 1 1 |
3. | 2.923 3 2.923 0 2.923 1 1 |
4. | 75.377 4 75.377 0 75.377 1 1 |
5. | 65.559 5 65.559 0 65.559 1 1 |
|--------------------------------------------------------------|
6. | 81.163 6 81.163 0 81.163 1 1 |
7. | 17.459 7 17.459 0 17.459 1 1 |
8. | 24.531 8 24.531 0 24.531 1 1 |
9. | 11.195 9 11.195 0 11.195 1 1 |
10. | 75.953 10 75.953 0 75.953 1 1 |
+--------------------------------------------------------------+
If that's the case, then you need to assure that your sales and maxsale
variables are in the same storage precision (float, double); OR you need
to explicitly round the one that is double-precision to float precision
when you make the comparison, using the float() function.
See -help precision- for more on what's going on here.
On 1/24/2014 11:55 AM, R Zhang wrote:
Thanks to you both, Sergiy and Nick .
Nick,
1.are you saying that I should follow Sergiy's advice to change
format? If so, given the large number of observations I have , how do
I automate the process?
2. if I do not change the format, I listed some observations below to
show you that sales and maxsale look the same, however, when I use" l
if sales == maxsale" it does not list all of the observations that
appear equal.
*****************
+--------------------+
| sales maxsale1 |
|--------------------|
1. | 25.395 25.395 |
2. | 32.007 32.007 |
3. | 53.798 53.798 |
4. | 12.748 12.748 |
5. | 13.793 13.793 |
..... omitted to save space
31. | 166.181 166.181 |
32. | 21.927 166.181 |
33. | 26.328 189.897 |
34. | 31.787 189.897 |
35. | 189.897 189.897 |
|--------------------|
36. | 264.582 264.582 |
37. | 33.61 264.582 |
38. | 312.227 312.227 |
39. | 35.413 312.227 |
40. | 406.36 406.36 |
|--------------------|
41. | 444.875 444.875 |
egen maxsale=max(sales), by (gvkey year)
l if sales == maxsale,
the first observation that is listed is 444.875 444.875 ,
why is that?
thanks!
On Fri, Jan 24, 2014 at 11:34 AM, Nick Cox <[email protected]> wrote:
This is very good advice in general, but in this case the maxima are
selected from the original values, so that equality is to be expected
for some observations.
Nick
[email protected]
On 24 January 2014 16:31, Sergiy Radyakin <[email protected]> wrote:
Zhang, avoid comparing floating point numbers for equality. Instead
there is a system variable c(epsfloat) , which you can refer to when
you need to deal with precision:
clear
input float sales
25.395
32.007
end
list
display c(epsfloat)
list if sales==25.395
list if abs(sales-25.395)<=10*c(epsfloat)
list if sales==32.007
list if abs(sales-32.007)<=10*c(epsfloat)
Best, Sergiy Radyakin
On Fri, Jan 24, 2014 at 11:23 AM, Maarten Buis <[email protected]> wrote:
I would do this differently:
*------------------ begin example ------------------
// get some example data
sysuse auto
// create a variable denoting missing values
gen byte miss = missing(rep78, price)
// create our indicator variable
bys rep78 miss (price) : gen max = _n == _N if !miss
// admire the result
list rep78 miss price max in 1/12, sepby(rep78)
*------------------- end example -------------------
* (For more on examples I sent to the Statalist see:
* http://www.maartenbuis.nl/example_faq )
Hope this helps,
Maarten
On Fri, Jan 24, 2014 at 4:53 PM, R Zhang <[email protected]> wrote:
Dear Statalist,
my data structure is as follows
firmID segmentID sales year
1001 1 25.395 1990
1001 1 32.007 1991
............
a firm can operate in multiple segments as identified by segmentID .
I wanted to identify the largest segment by sales,so I used
bysort firmID year : egen maxsale=max(sales)
then I did
gen PriSIC=0
replace PriSIC=1 if sales=maxsale
I got
firmID segmentID sales year maxsale prisic
1001 1 25.395 1990 25.395 0
1001 1 32.007 1991 32.007 0
I could not figure out why prisic is 0, so I compute the diffderence
(sales-maxsale), it shows a very small negative number , and the data
dictionary shows sales format float %12.0g, and maxsale format float
%9.0g
what should I do to correct this?
thanks!!!
Rochelle
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/faqs/resources/statalist-faq/
* http://www.ats.ucla.edu/stat/stata/
--
---------------------------------
Maarten L. Buis
WZB
Reichpietschufer 50
10785 Berlin
Germany
http://www.maartenbuis.nl
---------------------------------
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/faqs/resources/statalist-faq/
* http://www.ats.ucla.edu/stat/stata/
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/faqs/resources/statalist-faq/
* http://www.ats.ucla.edu/stat/stata/
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/faqs/resources/statalist-faq/
* http://www.ats.ucla.edu/stat/stata/
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/faqs/resources/statalist-faq/
* http://www.ats.ucla.edu/stat/stata/
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/faqs/resources/statalist-faq/
* http://www.ats.ucla.edu/stat/stata/