[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

From |
wgould@stata.com (William Gould) |

To |
statalist@hsphsun2.harvard.edu |

Subject |
Re: st: sum of residuals=zero? |

Date |
Sat, 09 Nov 2002 09:54:01 -0600 |

Giovanni Vecchi <vecchi@economia.uniroma2.it> observed, > I would appreciate your comments on the following: > > . use auto.dta > > . regress price mpg > (output omitted) > > .predict resid,res > > . egen sumres=sum(resid) > > . sum sumres > > Variable | Obs Mean Std. Dev. Min Max > -------------+----------------------------------------------------- > sumres | 74 -.0004654 0 -.0004654 -.0004654 > > The sum of residuals (which should be zero according to the theory) is > -.0004654. This estimates looks "high" to me. I run the code above in Gauss > and obtained a much lower estimates (something like 9*10^-10). As Scott Merryman <smerryman@kc.rr.com has already observed, > Thank variable type for predict is float. If you specify double you will > get much higher precision. > > . use "C:\Stata\auto.dta", clear > (1978 Automobile Data) > > . qui reg price mpg > > . predict double res, res > > . sum res > > Variable | Obs Mean Std. Dev. Min Max > -------------+----------------------------------------------------- > res | 74 -2.58e-13 2605.621 -3184.174 9669.721 I now wish to go further than Scott and show that, given a set of estimates recorded in double-precison for the coefficients, the mean of -2.58e-13 is as small as can be obtained. I do this not because it is important but merely because we are very proud of the accuracy of the Stata code. The problem with looking at residuals is that they are the result of subtraction and, numerically speaking, subtraction is invariably inaccurate. An implication of the residuals summing to zero is that the mean of the predicted values should equal the mean of the original values. The wonderful thing about the test stated in these terms is that it avoids subtraction altogether. So let's make that calculation: . predict double hat (option xb assumed; fitted values) . sum price Variable | Obs Mean Std. Dev. Min Max -------------+----------------------------------------------------- price | 74 6165.257 2949.496 3291 15906 . scalar true = r(mean) . sum hat Variable | Obs Mean Std. Dev. Min Max -------------+----------------------------------------------------- hat | 74 6165.257 1382.124 1458.392 8386.329 . display r(mean)-true 0 And there it is: the result is exactly 0. There is not one detectable bit of inaccuracy at double precision. That is a very neat result. I hasten to add that the result is also not of great importance, numerically speaking, but we are proud of it. So how is it that if the means are exactly equal, the sum of the residuals is not also exactly zero? The former implies the latter: Sum(y1)/N - Sum(y2)/N = 0 => Sum(y1/N - y2/N) = 0 => Sum( (y1-y2)/N) = 0 => Sum( (y1-y2)) = 0 The answer has to do with the calculation of the (y1-y2) term. Whenever computers calculate a difference, they lose precision. The fact that Giovanni, performing a float-accuracy calcuation, obtained a sum of -.0004654, and that Scott, performing a double-accuracy calculation, obtained a sum of -2.58e-13, are nothing more than byproducts of the inaccuracy of digitial computers in making difference calculations. Both calculations amount to summing roundoff error. -- Bill wgould@stata.com * * For searches and help try: * http://www.stata.com/support/faqs/res/findit.html * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

- Prev by Date:
**st: Re: sum of residuals=zero?** - Next by Date:
**st: Program to calculate minimum average partial correlation added to SSC** - Previous by thread:
**st: Re: sum of residuals=zero?** - Next by thread:
**st: sum of residuals=zero?** - Index(es):

© Copyright 1996–2015 StataCorp LP | Terms of use | Privacy | Contact us | What's new | Site index |