Statalist The Stata Listserver


[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

Re: RE: st: behavior of -areg-


From   [email protected] (Richard Gates)
To   [email protected]
Subject   Re: RE: st: behavior of -areg-
Date   Tue, 28 Feb 2006 10:40:12 -0600

My response to the statalist requires a rebuttal on my part.  The promotion
from single to double precision is done in binary so my demonstration is
misleading in that I listed the promoted variables in base 10.  If you view the
variables in hexidecimal you can see that variable fvalue generated by 

. gen double fvalue = mvalue

is really padded in zeros, but the dvalue generated by

. gen double dvalue = round(mvalue,10^(digits+ceil(log10(abs(mvalue)))))

is not.

. format %21x mvalue fvalue dvalue

. list fvalue mvalue dvalue in 1/20

     +-----------------------------------------------------------------------+
     |                fvalue                  mvalue                  dvalue |
     |-----------------------------------------------------------------------|
  1. | +1.80d0000000000X+00b   +1.80d0000000000X+00b   +1.80d0000000001X+00b |
  2. | +1.235b340000000X+00c   +1.235b340000000X+00c   +1.235b333333334X+00c |
  3. | +1.50b19a0000000X+00c   +1.50b19a0000000X+00c   +1.50b199999999bX+00c |
  4. | +1.5d06660000000X+00b   +1.5d06660000000X+00b   +1.5d06666666668X+00b |
  5. | +1.0d93340000000X+00c   +1.0d93340000000X+00c   +1.0d93333333334X+00c |
     |-----------------------------------------------------------------------|
  6. | +1.223e660000000X+00c   +1.223e660000000X+00c   +1.223e666666667X+00c |
  7. | +1.1c73340000000X+00c   +1.1c73340000000X+00c   +1.1c73333333334X+00c |
  8. | +1.9583340000000X+00b   +1.9583340000000X+00b   +1.9583333333335X+00b |
  9. | +1.fab6660000000X+00b   +1.fab6660000000X+00b   +1.fab6666666668X+00b |
 10. | +1.11b4cc0000000X+00c   +1.11b4cc0000000X+00c   +1.11b4cccccccceX+00c |
     |-----------------------------------------------------------------------|
 11. | +1.2e8e660000000X+00c   +1.2e8e660000000X+00c   +1.2e8e666666667X+00c |
 12. | +1.324e660000000X+00c   +1.324e660000000X+00c   +1.324e666666667X+00c |
 13. | +1.b8d0000000000X+00b   +1.b8d0000000000X+00b   +1.b8d0000000002X+00b |
 14. | +1.96d6660000000X+00b   +1.96d6660000000X+00b   +1.96d6666666668X+00b |
 15. | +1.ce86660000000X+00b   +1.ce86660000000X+00b   +1.ce86666666668X+00b |
     |-----------------------------------------------------------------------|
 16. | +1.d573340000000X+00b   +1.d573340000000X+00b   +1.d573333333335X+00b |
 17. | +1.2e10000000000X+00c   +1.2e10000000000X+00c   +1.2e10000000001X+00c |
 18. | +1.33ce660000000X+00c   +1.33ce660000000X+00c   +1.33ce666666667X+00c |
 19. | +1.861b340000000X+00c   +1.861b340000000X+00c   +1.861b333333335X+00c |
 20. | +1.5d999a0000000X+00c   +1.5d999a0000000X+00c   +1.5d9999999999bX+00c |
     +-----------------------------------------------------------------------+

> I would like to add to this demonstration where I show the
> dangers of having single precision variables (float) and not 
> promoting them to double precision with caution.
> 
> Note that single precision will have only about 7 digits
> of precision while double has about 16 corresponding to 
> 24 and 53 bits for the mantissia:
> 
> . di c(epsfloat)
> 1.192e-07
> 
> . di c(epsdouble)
> 2.220e-16
> 
> . di 2^(1-24)
> 1.192e-07
> 
> . di 2^(1-53)
> 2.220e-16
> 
> machine epsilon is 2^(1-p) where p is the number of bits
> reserved for the mantissia of floating point numbers.
> 
> Now take a look at the storage type for each of the variables
> in the grunfeld dataset.
> 
> . webuse grunfeld,clear
> 
> . describe 
> 
> Contains data from http://www.stata-press.com/data/r9/grunfeld.dta
>   obs:           200                          
>  vars:             6                          3 Mar 2005 20:27
>  size:         5,600 (99.5% of memory free)
> -------------------------------------------------------------------------------
>               storage  display     value
> variable name   type   format      label      variable label
> -------------------------------------------------------------------------------
> company         float  %9.0g                  
> year            float  %9.0g                  
> invest          float  %9.0g                  
> mvalue          float  %9.0g                  
> kstock          float  %9.0g                  
> time            float  %9.0g                  
> 
> . list in 1/20
> 
>      +--------------------------------------------------+
>      | company   year   invest   mvalue   kstock   time |
>      |--------------------------------------------------|
>   1. |       1   1935    317.6   3078.5      2.8      1 |
>   2. |       1   1936    391.8   4661.7     52.6      2 |
>   3. |       1   1937    410.6   5387.1    156.9      3 |
>   4. |       1   1938    257.7   2792.2    209.2      4 |
>   5. |       1   1939    330.8   4313.2    203.4      5 |
>      |--------------------------------------------------|
>   6. |       1   1940    461.2   4643.9    207.2      6 |
>   7. |       1   1941      512   4551.2    255.2      7 |
>   8. |       1   1942      448   3244.1    303.7      8 |
>   9. |       1   1943    499.6   4053.7    264.1      9 |
>  10. |       1   1944    547.5   4379.3    201.6     10 |
>      |--------------------------------------------------|
>  11. |       1   1945    561.2   4840.9      265     11 |
>  12. |       1   1946    688.1   4900.9    402.2     12 |
>  13. |       1   1947    568.9   3526.5    761.5     13 |
>  14. |       1   1948    529.2   3254.7    922.4     14 |
>  15. |       1   1949    555.1   3700.2   1020.1     15 |
>      |--------------------------------------------------|
>  16. |       1   1950    642.9   3755.6     1099     16 |
>  17. |       1   1951    755.9     4833   1207.7     17 |
>  18. |       1   1952    891.2   4924.9   1430.5     18 |
>  19. |       1   1953   1304.4   6241.7   1777.3     19 |
>  20. |       1   1954   1486.7   5593.6   2226.3     20 |
>      +--------------------------------------------------+
> 
> We can extend the display format of mvalue and kstock and you will see
> what you are going to get beyond the 7-th digit when you promote these
> variables to double precision.
> 
> . format %20.15g kstock mvalue
> 
> . gen double fvalue = mvalue
> 
> . gen double fstock = kstock
> 
> . format %20.15g fvalue fstock
> 
> . list fvalue mvalue fstock kstock in 1/20
> 
>    +---------------------------------------------------------------------------+
>    |           fvalue             mvalue             fstock             kstock |   |---------------------------------------------------------------------------|
>  1.|           3078.5             3078.5   2.79999995231628   2.79999995231628 |
>  2.|  4661.7001953125    4661.7001953125   52.5999984741211   52.5999984741211 |
>  3.| 5387.10009765625   5387.10009765625   156.899993896484   156.899993896484 |
>  4.| 2792.19995117188   2792.19995117188   209.199996948242   209.199996948242 |
>  5.|  4313.2001953125    4313.2001953125   203.399993896484   203.399993896484 |
>    |---------------------------------------------------------------------------|
>  6.| 4643.89990234375   4643.89990234375   207.199996948242   207.199996948242 |
>  7.|  4551.2001953125    4551.2001953125   255.199996948242   255.199996948242 |
>  8.| 3244.10009765625   3244.10009765625   303.700012207031   303.700012207031 |
>  9.| 4053.69995117188   4053.69995117188   264.100006103516   264.100006103516 |
> 10.|  4379.2998046875    4379.2998046875   201.600006103516   201.600006103516 |
>    |---------------------------------------------------------------------------|
> 11 | 4840.89990234375   4840.89990234375                265                265 |
> 12.| 4900.89990234375   4900.89990234375   402.200012207031   402.200012207031 |
> 13.|           3526.5             3526.5              761.5              761.5 |
> 14.| 3254.69995117188   3254.69995117188   922.400024414063   922.400024414063 |
> 15.| 3700.19995117188   3700.19995117188   1020.09997558594   1020.09997558594 |
>    |---------------------------------------------------------------------------|
> 16.| 3755.60009765625   3755.60009765625               1099               1099 |
> 17.|             4833               4833   1207.69995117188   1207.69995117188 |
> 18.| 4924.89990234375   4924.89990234375             1430.5             1430.5 |
> 19.|  6241.7001953125    6241.7001953125   1777.30004882813   1777.30004882813 |
> 20.| 5593.60009765625   5593.60009765625   2226.30004882813   2226.30004882813 |
>    +---------------------------------------------------------------------------+
> 
> 
> . egen double t1 = total(mva), by(com)
> 
> . egen double t2 = total(kstock), by(com)
> 
> . format %20.15g t1 t2
> 
> . list t1 t2 in 1/20
> 
>      +------------------------------------+
>      |              t1                 t2 |
>      |------------------------------------|
>   1. | 86676.900390625   12968.7000625134 |
>   2. | 86676.900390625   12968.7000625134 |
>   3. | 86676.900390625   12968.7000625134 |
>   4. | 86676.900390625   12968.7000625134 |
>   5. | 86676.900390625   12968.7000625134 |
>      |------------------------------------|
>   6. | 86676.900390625   12968.7000625134 |
>   7. | 86676.900390625   12968.7000625134 |
>   8. | 86676.900390625   12968.7000625134 |
>   9. | 86676.900390625   12968.7000625134 |
>  10. | 86676.900390625   12968.7000625134 |
>      |------------------------------------|
>  11. | 86676.900390625   12968.7000625134 |
>  12. | 86676.900390625   12968.7000625134 |
>  13. | 86676.900390625   12968.7000625134 |
>  14. | 86676.900390625   12968.7000625134 |
>  15. | 86676.900390625   12968.7000625134 |
>      |------------------------------------|
>  16. | 86676.900390625   12968.7000625134 |
>  17. | 86676.900390625   12968.7000625134 |
>  18. | 86676.900390625   12968.7000625134 |
>  19. | 86676.900390625   12968.7000625134 |
>  20. | 86676.900390625   12968.7000625134 |
>      +------------------------------------+
> 
> You can see here that even though we have made the variables
> t1 and t2 double precision, we really only have done single 
> precision numerics (actually a little better than single
> precision) and the variables have only 9 digits that
> are any good.  We gained some precsion, but we did not
> do as well as we would have if we started with full double
> precision.  This garbage beyond the 9-th digit can be 
> enough for -_rmcoll- to not detect variable collinearity.
> 
> Now if we promote mvalue and kstock to double carefully
> we can get full precision.
> 
> . scalar digits = round(log10(abs(c(epsfloat))),1)+1
> 
> . gen double dvalue = round(mvalue,10^(digits+ceil(log10(abs(mvalue))))) 
> 
> . gen double dstock = round(kstock,10^(digits+ceil(log10(abs(kstock))))) 
> 
> . format %20.15g dvalue dstock 
> 
> . list dvalue mvalue dstock kstock in 1/20
> 
>     +-------------------------------------------------------+
>      | dvalue             mvalue   dstock             kstock |
>      |-------------------------------------------------------|
>   1. | 3078.5             3078.5      2.8   2.79999995231628 |
>   2. | 4661.7    4661.7001953125     52.6   52.5999984741211 |
>   3. | 5387.1   5387.10009765625    156.9   156.899993896484 |
>   4. | 2792.2   2792.19995117188    209.2   209.199996948242 |
>   5. | 4313.2    4313.2001953125    203.4   203.399993896484 |
>      |-------------------------------------------------------|
>   6. | 4643.9   4643.89990234375    207.2   207.199996948242 |
>   7. | 4551.2    4551.2001953125    255.2   255.199996948242 |
>   8. | 3244.1   3244.10009765625    303.7   303.700012207031 |
>   9. | 4053.7   4053.69995117188    264.1   264.100006103516 |
>  10. | 4379.3    4379.2998046875    201.6   201.600006103516 |
>      |-------------------------------------------------------|
>  11. | 4840.9   4840.89990234375      265                265 |
>  12. | 4900.9   4900.89990234375    402.2   402.200012207031 |
>  13. | 3526.5             3526.5    761.5              761.5 |
>  14. | 3254.7   3254.69995117188    922.4   922.400024414063 |
>  15. | 3700.2   3700.19995117188   1020.1   1020.09997558594 |
>      |-------------------------------------------------------|
>  16. | 3755.6   3755.60009765625     1099               1099 |
>  17. |   4833               4833   1207.7   1207.69995117188 |
>  18. | 4924.9   4924.89990234375   1430.5             1430.5 |
>  19. | 6241.7    6241.7001953125   1777.3   1777.30004882813 |
>  20. | 5593.6   5593.60009765625   2226.3   2226.30004882813 |
>      +-------------------------------------------------------+
> 
> . egen double d1 = total(dvalue), by(com)
> 
> . egen double d2 = total(dstock), by(com)
> 
> . format %20.15g d1 d2
> 
> . list d1 t1 d2 t2 in 1/20
> 
>      +--------------------------------------------------------+
>      |      d1                t1        d2                 t2 |
>      |--------------------------------------------------------|
>   1. | 86676.9   86676.900390625   12968.7   12968.7000625134 |
>   2. | 86676.9   86676.900390625   12968.7   12968.7000625134 |
>   3. | 86676.9   86676.900390625   12968.7   12968.7000625134 |
>   4. | 86676.9   86676.900390625   12968.7   12968.7000625134 |
>   5. | 86676.9   86676.900390625   12968.7   12968.7000625134 |
>      |--------------------------------------------------------|
>   6. | 86676.9   86676.900390625   12968.7   12968.7000625134 |
>   7. | 86676.9   86676.900390625   12968.7   12968.7000625134 |
>   8. | 86676.9   86676.900390625   12968.7   12968.7000625134 |
>   9. | 86676.9   86676.900390625   12968.7   12968.7000625134 |
>  10. | 86676.9   86676.900390625   12968.7   12968.7000625134 |
>      |--------------------------------------------------------|
>  11. | 86676.9   86676.900390625   12968.7   12968.7000625134 |
>  12. | 86676.9   86676.900390625   12968.7   12968.7000625134 |
>  13. | 86676.9   86676.900390625   12968.7   12968.7000625134 |
>  14. | 86676.9   86676.900390625   12968.7   12968.7000625134 |
>  15. | 86676.9   86676.900390625   12968.7   12968.7000625134 |
>      |--------------------------------------------------------|
>  16. | 86676.9   86676.900390625   12968.7   12968.7000625134 |
>  17. | 86676.9   86676.900390625   12968.7   12968.7000625134 |
>  18. | 86676.9   86676.900390625   12968.7   12968.7000625134 |
>  19. | 86676.9   86676.900390625   12968.7   12968.7000625134 |
>  20. | 86676.9   86676.900390625   12968.7   12968.7000625134 |
>      +--------------------------------------------------------+
> 
> Now we do Scott's experiment in double precision:
> 
> . gen double dinvest = round(invest,10^(digits+ceil(log10(abs(invest)))))
> 
> . recast long company
> 
> . areg dinvest dstock d1, ab(com)
> 
> Linear regression, absorbing indicators                Number of obs =     200
>                                                        F(  1,   189) =  366.45
>                                                        Prob > F      =  0.0000
>                                                        R-squared     =  0.9184
>                                                        Adj R-squared =  0.9141
>                                                        Root MSE      =  63.566
> 
> ------------------------------------------------------------------------------
>      dinvest |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
> -------------+----------------------------------------------------------------
>       dstock |   .3707496   .0193676    19.14   0.000     .3325452    .4089541
>           d1 |  (dropped)
>        _cons |   43.62499   6.984315     6.25   0.000     29.84777    57.40222
> -------------+----------------------------------------------------------------
>      company |         F(9, 189) =     36.975   0.000          (10 categories)
> 
> . areg dinvest dvalue d2, ab(com)
> 
> Linear regression, absorbing indicators                Number of obs =     200
>                                                        F(  1,   189) =  111.35
>                                                        Prob > F      =  0.0000
>                                                        R-squared     =  0.8491
>                                                        Adj R-squared =  0.8411
>                                                        Root MSE      =  86.444
> 
> ------------------------------------------------------------------------------
>      dinvest |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
> -------------+----------------------------------------------------------------
>       dvalue |   .1898776   .0179944    10.55   0.000     .1543819    .2253733
>           d2 |  (dropped)
>        _cons |  -59.42872   20.40144    -2.91   0.004     -99.6725   -19.18494
> -------------+----------------------------------------------------------------
>      company |         F(9, 189) =     15.973   0.000          (10 categories)
> 
> 
> 
> So my advice is to be very wary of single precision floating point variable.
> 
> 
> -Rich
> *
> *   For searches and help try:
> *   http://www.stata.com/support/faqs/res/findit.html
> *   http://www.stata.com/support/statalist/faq
> *   http://www.ats.ucla.edu/stat/stata/
*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2024 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index