# Re: st: sum of residuals=zero?

 From [email protected] (William Gould) To [email protected] Subject Re: st: sum of residuals=zero? Date Sat, 09 Nov 2002 09:54:01 -0600

```Giovanni Vecchi <[email protected]> observed,

>
>       . use auto.dta
>
>       . regress price mpg
>         (output omitted)
>
>       .predict resid,res
>
>       . egen sumres=sum(resid)
>
>       . sum sumres
>
>           Variable |     Obs        Mean   Std. Dev.       Min        Max
>       -------------+-----------------------------------------------------
>             sumres |      74   -.0004654          0  -.0004654  -.0004654
>
> The sum of residuals (which should be zero according to the theory) is
> -.0004654. This estimates looks "high" to me.  I run the code above in Gauss
> and obtained a much lower estimates (something like 9*10^-10).

As Scott Merryman <[email protected] has already observed,

> Thank variable type for predict is float.  If you specify double you will
> get much higher precision.
>
>       . use "C:\Stata\auto.dta", clear
>       (1978 Automobile Data)
>
>       . qui reg price mpg
>
>       . predict double res, res
>
>       . sum res
>
>           Variable |     Obs        Mean   Std. Dev.       Min        Max
>       -------------+-----------------------------------------------------
>                res |      74   -2.58e-13   2605.621  -3184.174   9669.721

I now wish to go further than Scott and show that, given a set of estimates
recorded in double-precison for the coefficients, the mean of -2.58e-13 is as
small as can be obtained.  I do this not because it is important but merely
because we are very proud of the accuracy of the Stata code.

The problem with looking at residuals is that they are the result of
subtraction and, numerically speaking, subtraction is invariably inaccurate.
An implication of the residuals summing to zero is that the mean of the
predicted values should equal the mean of the original values.  The wonderful
thing about the test stated in these terms is that it avoids subtraction
altogether.  So let's make that calculation:

. predict double hat
(option xb assumed; fitted values)

. sum price

Variable |     Obs        Mean   Std. Dev.       Min        Max
-------------+-----------------------------------------------------
price |      74    6165.257   2949.496       3291      15906

. scalar true = r(mean)

. sum hat

Variable |     Obs        Mean   Std. Dev.       Min        Max
-------------+-----------------------------------------------------
hat |      74    6165.257   1382.124   1458.392   8386.329

. display r(mean)-true
0

And there it is:  the result is exactly 0.  There is not one detectable bit of
inaccuracy at double precision.  That is a very neat result.  I hasten to add
that the result is also not of great importance, numerically speaking, but we
are proud of it.

So how is it that if the means are exactly equal, the sum of the residuals
is not also exactly zero?  The former implies the latter:

Sum(y1)/N - Sum(y2)/N = 0

=>    Sum(y1/N - y2/N) = 0
=>    Sum( (y1-y2)/N)  = 0
=>    Sum( (y1-y2))    = 0

The answer has to do with the calculation of the (y1-y2) term.  Whenever
computers calculate a difference, they lose precision.  The fact that
Giovanni, performing a float-accuracy calcuation, obtained a sum of -.0004654,
and that Scott, performing a double-accuracy calculation, obtained a sum of
-2.58e-13, are nothing more than byproducts of the inaccuracy of digitial
computers in making difference calculations.  Both calculations amount to
summing roundoff error.

-- Bill
[email protected]
*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
```