Statalist The Stata Listserver

[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

Re: st: Interesting numerical accuracy/collinearity issue

From   "Stas Kolenikov" <>
Subject   Re: st: Interesting numerical accuracy/collinearity issue
Date   Wed, 12 Apr 2006 16:45:18 -0500

At your leisure time (probably a few years after retirement???...),
you might want to check a reference like Numerical Recipes or Demmel's
Applied Numerical Linear Algebra book from SIAM. The finite
arithemtics space can be formally set up, and it is really strange;
the series \sum 1/n converges in that space, for instance, according
to the difference in partial sums criterion. So yes, you can think
about linear algebra augmented by finite precision. Stata is generally
aware of strange properties of that space, so commands like -_rmcoll-
or  -issymmetric()- or -diag0cnt()- count the differences of matrix
entries or eigenvalues from zero up to that finite precision. If there
is a roundoff error introduced at some steps in shifting and scaling,
-_rmcoll- would be still able to tell if there is collinearity, up to
numerical accuracy of the X'X matrix.

The general principles of finite precision arithmetics are generally
based on condition numbers, which is, roughly, the largest possible
change in the answer due to infinitesimal change in the inputs of the
procedure. Invoking appropriate infinitesimal (mathematical rather
than computer!) calculus, the condition numbers for many matrix
operations like matrix inversions or determinants or linear systems
can be shown to be the ratios of the largest to the smallest
eigenvalues. Suppose this ratio for a your particular matrix is 10^4
(which is not that huge; in -reg pri wei for trunk disp- with auto
data, the condition number of the covariance matrix is 4e8). Then by
taking your variables to the fourth power, you'll get that number to
10^16 (not quite so, but you can think of this as the worst case
scenario), and that is already beyond the double arithmetics routinely
employed (may be in the guts of Stata there is also quad arithmetics;
I was coming across it a couple of times in Mata): epsdouble() =
2e-16, so a single blurp of that order leads to the change
epsdouble()*condition# approx= order of one: you cannot trust even the
first digit of your answer. That's, roughly, why the unscaled
variables are bad; and why you should center your variables; and why
integrated processes lead to weird distributions... oops, that's
another story, sorry :))

On 4/12/06, Schaffer, Mark E <> wrote:
> My follow-up question is simple: why does the shifting and scaling used by Stata's
> ‑ovtest‑ introduce greater accuracy rather than, say, greater rounding error?  (Either
> accuracy or error would remove the numerical collinearity.)  The algebra doesn't help
> me here, since all three methods are algebraically equivalent.  I'm guessing that there's
> probably a general principle about how best to maintain numerical precision, but I don't
> know what it might be.

Stas Kolenikov

*   For searches and help try:

© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index