st: Re: ML estimation and gradient

Tue, 21 Aug 2007 09:57:32 -0400

In any numerical optimization procedure with numerical derivatives, the derivative is an approximation to the slope along an infinitesimal segment of the function. If you took an arbitrarily small epsilon around the optimum, and the optimum was expressed to a precision beyond 15 digits, you would get something closer to zero. In something like -ml-, applying a tighter convergence criterion (i.e. the norm of the gradient must be no more than 10^-8) you may not find an optimum at all, or it may take a long time. Thus any optimization routine trades off precision for speed and likelihood of convergence. Stata's behavior in this regard is similar to that of any other software I have used.

I would like to know why, when using maximum likelihood estimation in Stata,

the gradient in the last iteration is often numerically different from

zero whereas it should be theoretically equal to zero.

