Statalist The Stata Listserver


[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

Re: RE: st: behavior of -areg-


From   [email protected] (Richard Gates)
To   [email protected]
Subject   Re: RE: st: behavior of -areg-
Date   Tue, 28 Feb 2006 09:57:55 -0600

In response to Radu's <[email protected]> query on why -areg- was not dropping
collinear variables, Scott <[email protected]> made an interesting
demonstration on how machine precision can affect whether -_rmcoll- drops
variables:

> You replicate this with the grunfeld dataset and the mvalue variable.
> 
> sysuse grunfeld,clear
> egen double t1 = total(mva), by(com)
> egen double t2 = total(kstock), by(com)
> 
> //t1 not dropped
> areg invest kstock t1, ab(com)
> 
> //t2 dropped
> areg invest mvalu t2, ab(com)
> 
> replace t1 = t1/100
> //Now, t1 is dropped
> areg invest kstock t1, ab(com)
> 
> replace t2 = t2*10000
> //Now, t2 is not dropped
> areg invest mvalu t2, ab(com)

I would like to add to this demonstration where I show the
dangers of having single precision variables (float) and not 
promoting them to double precision with caution.

Note that single precision will have only about 7 digits
of precision while double has about 16 corresponding to 
24 and 53 bits for the mantissia:

. di c(epsfloat)
1.192e-07

. di c(epsdouble)
2.220e-16

. di 2^(1-24)
1.192e-07

. di 2^(1-53)
2.220e-16

machine epsilon is 2^(1-p) where p is the number of bits
reserved for the mantissia of floating point numbers.

Now take a look at the storage type for each of the variables
in the grunfeld dataset.

. webuse grunfeld,clear

. describe 

Contains data from http://www.stata-press.com/data/r9/grunfeld.dta
  obs:           200                          
 vars:             6                          3 Mar 2005 20:27
 size:         5,600 (99.5% of memory free)
-------------------------------------------------------------------------------
              storage  display     value
variable name   type   format      label      variable label
-------------------------------------------------------------------------------
company         float  %9.0g                  
year            float  %9.0g                  
invest          float  %9.0g                  
mvalue          float  %9.0g                  
kstock          float  %9.0g                  
time            float  %9.0g                  

. list in 1/20

     +--------------------------------------------------+
     | company   year   invest   mvalue   kstock   time |
     |--------------------------------------------------|
  1. |       1   1935    317.6   3078.5      2.8      1 |
  2. |       1   1936    391.8   4661.7     52.6      2 |
  3. |       1   1937    410.6   5387.1    156.9      3 |
  4. |       1   1938    257.7   2792.2    209.2      4 |
  5. |       1   1939    330.8   4313.2    203.4      5 |
     |--------------------------------------------------|
  6. |       1   1940    461.2   4643.9    207.2      6 |
  7. |       1   1941      512   4551.2    255.2      7 |
  8. |       1   1942      448   3244.1    303.7      8 |
  9. |       1   1943    499.6   4053.7    264.1      9 |
 10. |       1   1944    547.5   4379.3    201.6     10 |
     |--------------------------------------------------|
 11. |       1   1945    561.2   4840.9      265     11 |
 12. |       1   1946    688.1   4900.9    402.2     12 |
 13. |       1   1947    568.9   3526.5    761.5     13 |
 14. |       1   1948    529.2   3254.7    922.4     14 |
 15. |       1   1949    555.1   3700.2   1020.1     15 |
     |--------------------------------------------------|
 16. |       1   1950    642.9   3755.6     1099     16 |
 17. |       1   1951    755.9     4833   1207.7     17 |
 18. |       1   1952    891.2   4924.9   1430.5     18 |
 19. |       1   1953   1304.4   6241.7   1777.3     19 |
 20. |       1   1954   1486.7   5593.6   2226.3     20 |
     +--------------------------------------------------+

We can extend the display format of mvalue and kstock and you will see
what you are going to get beyond the 7-th digit when you promote these
variables to double precision.

. format %20.15g kstock mvalue

. gen double fvalue = mvalue

. gen double fstock = kstock

. format %20.15g fvalue fstock

. list fvalue mvalue fstock kstock in 1/20

   +---------------------------------------------------------------------------+
   |           fvalue             mvalue             fstock             kstock |   |---------------------------------------------------------------------------|
 1.|           3078.5             3078.5   2.79999995231628   2.79999995231628 |
 2.|  4661.7001953125    4661.7001953125   52.5999984741211   52.5999984741211 |
 3.| 5387.10009765625   5387.10009765625   156.899993896484   156.899993896484 |
 4.| 2792.19995117188   2792.19995117188   209.199996948242   209.199996948242 |
 5.|  4313.2001953125    4313.2001953125   203.399993896484   203.399993896484 |
   |---------------------------------------------------------------------------|
 6.| 4643.89990234375   4643.89990234375   207.199996948242   207.199996948242 |
 7.|  4551.2001953125    4551.2001953125   255.199996948242   255.199996948242 |
 8.| 3244.10009765625   3244.10009765625   303.700012207031   303.700012207031 |
 9.| 4053.69995117188   4053.69995117188   264.100006103516   264.100006103516 |
10.|  4379.2998046875    4379.2998046875   201.600006103516   201.600006103516 |
   |---------------------------------------------------------------------------|
11 | 4840.89990234375   4840.89990234375                265                265 |
12.| 4900.89990234375   4900.89990234375   402.200012207031   402.200012207031 |
13.|           3526.5             3526.5              761.5              761.5 |
14.| 3254.69995117188   3254.69995117188   922.400024414063   922.400024414063 |
15.| 3700.19995117188   3700.19995117188   1020.09997558594   1020.09997558594 |
   |---------------------------------------------------------------------------|
16.| 3755.60009765625   3755.60009765625               1099               1099 |
17.|             4833               4833   1207.69995117188   1207.69995117188 |
18.| 4924.89990234375   4924.89990234375             1430.5             1430.5 |
19.|  6241.7001953125    6241.7001953125   1777.30004882813   1777.30004882813 |
20.| 5593.60009765625   5593.60009765625   2226.30004882813   2226.30004882813 |
   +---------------------------------------------------------------------------+


. egen double t1 = total(mva), by(com)

. egen double t2 = total(kstock), by(com)

. format %20.15g t1 t2

. list t1 t2 in 1/20

     +------------------------------------+
     |              t1                 t2 |
     |------------------------------------|
  1. | 86676.900390625   12968.7000625134 |
  2. | 86676.900390625   12968.7000625134 |
  3. | 86676.900390625   12968.7000625134 |
  4. | 86676.900390625   12968.7000625134 |
  5. | 86676.900390625   12968.7000625134 |
     |------------------------------------|
  6. | 86676.900390625   12968.7000625134 |
  7. | 86676.900390625   12968.7000625134 |
  8. | 86676.900390625   12968.7000625134 |
  9. | 86676.900390625   12968.7000625134 |
 10. | 86676.900390625   12968.7000625134 |
     |------------------------------------|
 11. | 86676.900390625   12968.7000625134 |
 12. | 86676.900390625   12968.7000625134 |
 13. | 86676.900390625   12968.7000625134 |
 14. | 86676.900390625   12968.7000625134 |
 15. | 86676.900390625   12968.7000625134 |
     |------------------------------------|
 16. | 86676.900390625   12968.7000625134 |
 17. | 86676.900390625   12968.7000625134 |
 18. | 86676.900390625   12968.7000625134 |
 19. | 86676.900390625   12968.7000625134 |
 20. | 86676.900390625   12968.7000625134 |
     +------------------------------------+

You can see here that even though we have made the variables
t1 and t2 double precision, we really only have done single 
precision numerics (actually a little better than single
precision) and the variables have only 9 digits that
are any good.  We gained some precsion, but we did not
do as well as we would have if we started with full double
precision.  This garbage beyond the 9-th digit can be 
enough for -_rmcoll- to not detect variable collinearity.

Now if we promote mvalue and kstock to double carefully
we can get full precision.

. scalar digits = round(log10(abs(c(epsfloat))),1)+1

. gen double dvalue = round(mvalue,10^(digits+ceil(log10(abs(mvalue))))) 

. gen double dstock = round(kstock,10^(digits+ceil(log10(abs(kstock))))) 

. format %20.15g dvalue dstock 

. list dvalue mvalue dstock kstock in 1/20

    +-------------------------------------------------------+
     | dvalue             mvalue   dstock             kstock |
     |-------------------------------------------------------|
  1. | 3078.5             3078.5      2.8   2.79999995231628 |
  2. | 4661.7    4661.7001953125     52.6   52.5999984741211 |
  3. | 5387.1   5387.10009765625    156.9   156.899993896484 |
  4. | 2792.2   2792.19995117188    209.2   209.199996948242 |
  5. | 4313.2    4313.2001953125    203.4   203.399993896484 |
     |-------------------------------------------------------|
  6. | 4643.9   4643.89990234375    207.2   207.199996948242 |
  7. | 4551.2    4551.2001953125    255.2   255.199996948242 |
  8. | 3244.1   3244.10009765625    303.7   303.700012207031 |
  9. | 4053.7   4053.69995117188    264.1   264.100006103516 |
 10. | 4379.3    4379.2998046875    201.6   201.600006103516 |
     |-------------------------------------------------------|
 11. | 4840.9   4840.89990234375      265                265 |
 12. | 4900.9   4900.89990234375    402.2   402.200012207031 |
 13. | 3526.5             3526.5    761.5              761.5 |
 14. | 3254.7   3254.69995117188    922.4   922.400024414063 |
 15. | 3700.2   3700.19995117188   1020.1   1020.09997558594 |
     |-------------------------------------------------------|
 16. | 3755.6   3755.60009765625     1099               1099 |
 17. |   4833               4833   1207.7   1207.69995117188 |
 18. | 4924.9   4924.89990234375   1430.5             1430.5 |
 19. | 6241.7    6241.7001953125   1777.3   1777.30004882813 |
 20. | 5593.6   5593.60009765625   2226.3   2226.30004882813 |
     +-------------------------------------------------------+

. egen double d1 = total(dvalue), by(com)

. egen double d2 = total(dstock), by(com)

. format %20.15g d1 d2

. list d1 t1 d2 t2 in 1/20

     +--------------------------------------------------------+
     |      d1                t1        d2                 t2 |
     |--------------------------------------------------------|
  1. | 86676.9   86676.900390625   12968.7   12968.7000625134 |
  2. | 86676.9   86676.900390625   12968.7   12968.7000625134 |
  3. | 86676.9   86676.900390625   12968.7   12968.7000625134 |
  4. | 86676.9   86676.900390625   12968.7   12968.7000625134 |
  5. | 86676.9   86676.900390625   12968.7   12968.7000625134 |
     |--------------------------------------------------------|
  6. | 86676.9   86676.900390625   12968.7   12968.7000625134 |
  7. | 86676.9   86676.900390625   12968.7   12968.7000625134 |
  8. | 86676.9   86676.900390625   12968.7   12968.7000625134 |
  9. | 86676.9   86676.900390625   12968.7   12968.7000625134 |
 10. | 86676.9   86676.900390625   12968.7   12968.7000625134 |
     |--------------------------------------------------------|
 11. | 86676.9   86676.900390625   12968.7   12968.7000625134 |
 12. | 86676.9   86676.900390625   12968.7   12968.7000625134 |
 13. | 86676.9   86676.900390625   12968.7   12968.7000625134 |
 14. | 86676.9   86676.900390625   12968.7   12968.7000625134 |
 15. | 86676.9   86676.900390625   12968.7   12968.7000625134 |
     |--------------------------------------------------------|
 16. | 86676.9   86676.900390625   12968.7   12968.7000625134 |
 17. | 86676.9   86676.900390625   12968.7   12968.7000625134 |
 18. | 86676.9   86676.900390625   12968.7   12968.7000625134 |
 19. | 86676.9   86676.900390625   12968.7   12968.7000625134 |
 20. | 86676.9   86676.900390625   12968.7   12968.7000625134 |
     +--------------------------------------------------------+

Now we do Scott's experiment in double precision:

. gen double dinvest = round(invest,10^(digits+ceil(log10(abs(invest)))))

. recast long company

. areg dinvest dstock d1, ab(com)

Linear regression, absorbing indicators                Number of obs =     200
                                                       F(  1,   189) =  366.45
                                                       Prob > F      =  0.0000
                                                       R-squared     =  0.9184
                                                       Adj R-squared =  0.9141
                                                       Root MSE      =  63.566

------------------------------------------------------------------------------
     dinvest |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
      dstock |   .3707496   .0193676    19.14   0.000     .3325452    .4089541
          d1 |  (dropped)
       _cons |   43.62499   6.984315     6.25   0.000     29.84777    57.40222
-------------+----------------------------------------------------------------
     company |         F(9, 189) =     36.975   0.000          (10 categories)

. areg dinvest dvalue d2, ab(com)

Linear regression, absorbing indicators                Number of obs =     200
                                                       F(  1,   189) =  111.35
                                                       Prob > F      =  0.0000
                                                       R-squared     =  0.8491
                                                       Adj R-squared =  0.8411
                                                       Root MSE      =  86.444

------------------------------------------------------------------------------
     dinvest |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
      dvalue |   .1898776   .0179944    10.55   0.000     .1543819    .2253733
          d2 |  (dropped)
       _cons |  -59.42872   20.40144    -2.91   0.004     -99.6725   -19.18494
-------------+----------------------------------------------------------------
     company |         F(9, 189) =     15.973   0.000          (10 categories)



So my advice is to be very wary of single precision floating point variable.


-Rich
*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2024 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index