# st: Precision issues with double storage type (??)

 From Hiroshi Maeda To Statalist Subject st: Precision issues with double storage type (??) Date Mon, 06 Aug 2007 16:36:34 -0600

Dear Stata users,

I am using Stata 9.2 (born July 20, 2007) for Windows. My Windows XP Service Pack 2, too, is fully up to date.

I am puzzled by the discrepancies between what I expect Stata to do and what it actually does, and I suspect storage/precision issues have something to do with it. Before you castigate me for beating this dead horse yet once again, please note that I have read relevant manual entries ([U] 13.10 Precision and problems therein) and statalist postings and that I encounter this problem even when numeric variables with fractions are stored in -double- (it's entirely possible that my feeble mind simply cannot comprehend the complexity of precision issues, though).

Let me describe my problem with a concrete example. I am appending the log file below.

I first create two observations, each of which has three variables of a financial nature: DEPOSIT_A (amount of money deposited to procure a service), PAYMENT_A (amount of money actually paid), and REFUND_A (amount of money refunded). The mathematical relationship among them is: DEPOSIT_A=PAYMENT_A+REFUND_A. Since I am in the U.S., the figures are in U.S. dollars.

The data look like this:

ID DEPOSIT_A PAYMENT_A REFUND_A

1 61.42 21.30 40.12

2 69.00 68.49 .51

DEPOSIT_A, PAYMENT_A, and REFUND_A are stored in -float-.

When I type: count if DEPOSIT_A==61.42, Stata returns 0 as expected.
When I type: count if DEPOSIT_A==float(61.42), Stata returns 1 as expected.

***Here is my first problem:

But when I type:

gen CHECK_2=1 if float(DEPOSIT_A)==float(PAYMENT_A)+float(REFUND_A),

Stata fails to recognize this relationship in observation 2 (Please see the log file below). I don't understand why Stata is not seeing this relationship even when I use the -float- function in the equation.

Then I proceed to create another set of financial variables conveying the same information in the -double- storage type.

Here is what I have done:

gen DEPOSIT_S=string(DEPOSIT_A, "%9.2f");
gen double DEPOSIT_B=real(DEPOSIT_S);

I have applied the same procedure to PAYMENT_A and REFUND_A to produce PAYMENT_B and REFUND_B.

I have then used the Data Editor to conform the values of the new variables are exactly what I want. The Data Editor shows that the value of DEPOSIT_A for observation 1 to be 61.419998 while the value of DEPOSIT_B to be 61.42.

When I type: gen CHECK_3=1 if DEPOSIT_B==PAYMENT_B+REFUND_B,

Stata recognizes this relationship in both observations (Please see the log file below).

Then I computationally create REFUND_C by typing:

gen double REFUND_C=DEPOSIT_B-PAYMENT_B;

****Here is my second problem:

Then I type: gen FLAG_REFUND=1 if REFUND_C!=REFUND_B,

expecting Stata to produce two missing values. But Stata apparently thinks that for both observations the values of REFUND_B and REFUND_C are different (Please see the log file below).

I don't understand why this is happening because, after all, all the variables involved in this operation are stored in -double- and computations are conducted with -double- precision...

If anyone on this list could advise me on this matter, I would appreciate it. Thank you.

Hiroshi Maeda

My demonstration begins here =======================================

. clear;

. set obs 2;
obs was 0, now 2

. gen ID=_n;

. gen float DEPOSIT_A=.;
(2 missing values generated)

. gen float PAYMENT_A=.;
(2 missing values generated)

. gen float REFUND_A=.;
(2 missing values generated)

. replace DEPOSIT_A=61.42 if ID==1;

. replace PAYMENT_A=21.30 if ID==1;

. replace REFUND_A =40.12 if ID==1;

. replace DEPOSIT_A=69.00 if ID==2;

. replace PAYMENT_A=68.49 if ID==2;

. replace REFUND_A = .51 if ID==2;

. format DEPOSIT_A %9.2f;

. format PAYMENT_A %9.2f;

. format REFUND_A %9.2f;

. desc;

Contains data
obs: 2
vars: 4
size: 48 (99.9% of memory free)
-------------------------------------------------------------------------------
storage display value
variable name type format label variable label
-------------------------------------------------------------------------------
ID double %10.0g
DEPOSIT_A float %9.2f
PAYMENT_A float %9.2f
REFUND_A float %9.2f
-------------------------------------------------------------------------------
Sorted by:
Note: dataset has changed since last saved

. list, nodisplay noobs compress sepby(ID);

+----------------------------+
| ID DEP~A PAY~A REF~A |
|----------------------------|
| 1 61.42 21.30 40.12 |
|----------------------------|
| 2 69.00 68.49 0.51 |
+----------------------------+

. count if DEPOSIT_A==61.42;
0

. count if DEPOSIT_A==float(61.42)
> /*=> This shows that I have read [U] 13.10 Precision and Problems therein*/;
1

. count if PAYMENT_A==21.30;
0

. count if PAYMENT_A==float(21.30);
1

. count if REFUND_A==40.12;
0

. count if REFUND_A==float(40.12);
1

. count if DEPOSIT_A==69.00;
1

. count if DEPOSIT_A==float(69.00);
1

. count if PAYMENT_A==68.49;
0

. count if PAYMENT_A==float(68.49);
1

. count if REFUND_A==.51;
0

. count if REFUND_A==float(.51);
1

. gen CHECK_1=1 if DEPOSIT_A==PAYMENT_A+REFUND_A;
(1 missing value generated)

. gen CHECK_2=1 if float(DEPOSIT_A)==float(PAYMENT_A)+float(REFUND_A);
(1 missing value generated)

. list, nodisplay noobs compress sepby(ID)
> /*Problem 1: I don't understand why CHECK_2 is missing for observation 2*/;

+--------------------------------------------+
| ID DEP~A PAY~A REF~A CHE~1 CHE~2 |
|--------------------------------------------|
| 1 61.42 21.30 40.12 1 1 |
|--------------------------------------------|
| 2 69.00 68.49 0.51 . . |
+--------------------------------------------+

. gen DEPOSIT_S=string(DEPOSIT_A, "%9.2f");

. gen PAYMENT_S=string(PAYMENT_A, "%9.2f");

. gen REFUND_S =string(REFUND_A, "%9.2f");

. gen double DEPOSIT_B=real(DEPOSIT_S);

. gen double PAYMENT_B=real(PAYMENT_S);

. gen double REFUND_B =real(REFUND_S);

. format DEPOSIT_B %9.2f;

. format PAYMENT_B %9.2f;

. format REFUND_B %9.2f;

. drop *_S;

. desc;

Contains data
obs: 2
vars: 9
size: 128 (99.9% of memory free)
-------------------------------------------------------------------------------
storage display value
variable name type format label variable label
-------------------------------------------------------------------------------
ID double %10.0g
DEPOSIT_A float %9.2f
PAYMENT_A float %9.2f
REFUND_A float %9.2f
CHECK_1 double %10.0g
CHECK_2 double %10.0g
DEPOSIT_B double %9.2f
PAYMENT_B double %9.2f
REFUND_B double %9.2f
-------------------------------------------------------------------------------
Sorted by:
Note: dataset has changed since last saved

. gen CHECK_3=1 if DEPOSIT_B==PAYMENT_B+REFUND_B;

. list DEPOSIT_A-REFUND_A DEPOSIT_B-REFUND_B CHECK_3, nodisplay noobs compress sepby(ID);

+-------------------------------------------------------+
| DEP~A PAY~A REF~A DEP~B PAY~B REF~B CHE~3 |
|-------------------------------------------------------|
| 61.42 21.30 40.12 61.42 21.30 40.12 1 |
|-------------------------------------------------------|
| 69.00 68.49 0.51 69.00 68.49 0.51 1 |
+-------------------------------------------------------+

. gen REFUND_C=DEPOSIT_B-PAYMENT_B;

. format REFUND_C %9.2f;

. gen FLAG_REFUND=1 if REFUND_C!=REFUND_B;

. list DEPOSIT_A-REFUND_A DEPOSIT_B-REFUND_B FLAG_REFUND, nodisplay noobs compress sepby(ID);

+-------------------------------------------------------+
| DEP~A PAY~A REF~A DEP~B PAY~B REF~B FLA~D |
|-------------------------------------------------------|
| 61.42 21.30 40.12 61.42 21.30 40.12 1 |
|-------------------------------------------------------|
| 69.00 68.49 0.51 69.00 68.49 0.51 1 |
+-------------------------------------------------------+

. /*Problem 2: I don't understand why FLAG_REFUND has flagged observations 1 & 2*/;

--
Hiroshi Maeda
University of Illinois at Chicago
hmaeda1@uic.edu

*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/