Notice: On March 31, it was **announced** that Statalist is moving from an email list to a **forum**. The old list will shut down on April 23, and its replacement, **statalist.org** is already up and running.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

From |
wgould@stata.com (William Gould, StataCorp LP) |

To |
statalist@hsphsun2.harvard.edu |

Subject |
Re: st: Algebra problem |

Date |
Wed, 21 Apr 2010 11:17:28 -0500 |

Tunga Kantarci <tungakantarci@hotmail.com> asks, > Why does Stata 11 give me different counts for the following ? > > . count if DU1_65_R3C3_5_==1.1*DU_default_income > 51 > > . count if DU1_65_R3C3_5_/DU_default_income==1.1 > 308 Stas Kolenikov <skolenik@gmail.com> replied, > Wrap everything in float() and see if the numbers change; [...] Good, if brief, answer. To be explicit, Stas wants Tunga to type . count if float(DU1_65_R3C3_5_) == float(float(1.1)*float(DU_default_income)) and . count if float(float(DU1_65_R3C3_5_)/(float(DU_default_income)) == float(1.1) although if two variables DU1_65_R3C3_5_ and DU_default_income area already stored as -float-, Tunga typing float(DU1_65_r3C3_5) and float(DU_default_income) is not necessary because they already are float, so the above reduces too, . count if DU1_65_R3C3_5_ == float(float(1.1)*DU_default_income) . count if float(DU1_65_R3C3_5_/DU_default_income) == float(1.1) There are lots of issues here, and Stas when after one or two of them. The issues are: 1. Tunga thinks in base 10; Stata thinks (and calculates) in binary. Decimal number 1.1 has no exact representation in binary; it is 1.000110011001100110011001... 2. Even in binary, Stata does not do infnite precision arithmetic. It rounds binary 1.000110011001100110011001... to to 1.000110011001100110011001100110011001100110011001100. Finite-precision rounding also applies to the values stored in variables themselves, too. 3. Even if Stata did all calculations in decimal, and even if Stata used infinite precision, Tunga never wanted to count DU1_65_R3C3_5_==1.1*DU_default_income or to count whether their ratio was 1.1 because Tunga already knew the answer. If the two variables are reals, the changes of their ratio being any single exact real is 0. "Wait!," Tonga says concerning (3). "These are incomes and are recorded in dollars and cents. Cute point you're making about reals, but my numbers have only two digits after the decimal point. Moreover, the distribution of the two numbers is not rectangular; it's humped, and that increases the chances even more." I reply: Okay; I just did one simulation. I just 100,000 incomes from N(25000, 2000). Then I drew another 100,000 incomes from the same distribution. The incomes are uncorrelated. I rounded incomes to two digits to the right of the decimal point. I worked in infinite precision. I worked in base 10. I counted the number of cases in which the ratio is exactly 1.1. The count was 0. I'll do it again if you would like. Before Tonga says "But my incomes are correlated!" which, I admit, increases the chances that the exact ratio is 1.1, I will warn that still, counts will be roughly 0. It is true, if we pick the right parameters for the distribution (make the ratio of means 1.1, for instance), we can drive the chances of observing 1 or more ratios in 100,000 trials of 1.1 up, but working in full, infinite precision, you will still be surprised how rare a ratio of exactly 1.1 is. The fact is that Tonga never wanted exactly 1.1. Tonga wanted around 1.1. Stas's solution to Tonga's problem was, "Perhaps, by around, you mean float precision." My response is that float precision is a narrow interval indeed and that Tonga should think about what he means by around 1.1. Let's assume that Tonga wants 1.1 +/- .0001. Then Tonga should type . gen ratio = DU1_65_R3C3_5_/DU_default_income . count if ratio>=1.1+.0001 & ratio<=1.1-.0001 I chose to type that in two lines just to make the typing easier, and to make what I'm doing more obvious to the reader. Tonga can type it however he wants. I expressed the value as a ratio, but Tonga can use use whatever mathematically equivalent way of expressing it he desires. But don't you still have to be careful, you might ask. Aren't there still decimal/binary issues? Yes, there are binary issues, and yes, there are rounding issues, and yes, there are even float vs. double issues. Put them all together, and the problem becomes very complicated. Put put them all together and we are still talking round-off error. We have enough precision so that the round-off error will not matter. More than enough precision. We have had accuracy discussions on Statalist before. This one, however, is different because this time the finite-precision issues served merely to uncover what was in fact a substantive issue. Once Tunga defines appropriately what means by the income ratios being roughly 1.1, he will find that the finite-preicision issues will shrink to unimportance. The precision that we do have is more than adequate once the problem is properly defined. Tonga's problem, count ratio==1.1, is very different from the usual precision and binary/base-10 we have discussed on the list, such as counting income == 24239.12. For that problem, the right solution, and the required solution when the variable stored as float, is income == float(24239.12). In this case, I am asking Tunga not to type ratio == <some_number>. I made the obvious point that ratio == some_number is a zero-probability even in the reals, and very near zero in most real cases, assuming we make the calculations in the infinite-preicision mathematical way. Rather than type ratio == some_number, I want Tunga to think in terms of some_number_1 <= ratio <= <some_number_2, where the numbers are chosen is some reasonable, population meaningful way. Once he does that, the precision issues shrink back into the background where Tunga can ignore them. -- Bill wgould@stata.com * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

- Prev by Date:
**st: Re: Panel causality test** - Next by Date:
**st: - graph matrix - with regression lines** - Previous by thread:
**st: RE: Algebra problem** - Next by thread:
**st: possible fourier transformation** - Index(es):