Stata: Data Analysis and Statistical Software

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: [ml tolerance (originally)]

From	[email protected] (Jeff Pitblado, StataCorp LP)
To	[email protected]
Subject	Re: st: [ml tolerance (originally)]
Date	Tue, 25 Jan 2011 08:54:26 -0600

Christian Gregory <[email protected]> had two follow-up questions
regarding the convergence tolerances used by -ml-:

> 1. Jeff, when you say relative difference for, say, ptol, is this (p for
> this iteration - p for the last iteration) - ptolerance?
> 
> 2. Can you say something about the relevance of the g*(-inv(H))g'
> criterion? Does it do something other than make sure the 1st and second
> derivatives are zero?

Klauss Pforr <[email protected]> pointed Christian to a section
of the -moptimize()- help file that answers the first question.  The specific
information is:

	Let

		b	= full set of coefficients
		b_prior	= value of b from prior iteration

	then define

		C_ptol:	mreldif(b, b_prior) <= ptol

As for

		C_nrtol: g*invsym(-H)*g' < nrtol

this criterion checks that the Hessian scaled gradient values are sufficiently
close to zero.

C_ntrol and C_ptol are similar in spirit, they both check on some measure of
change in the coefficient values between iterations.

You may be asking:	So where does C_nrtol come from?
			Why do we need C_nrtol?

b_prior is a rowvector, so the update vector is

	d = g_prior*invsym(-H_prior)

where g_prior and H_prior are the gradient vector and Hessian matrix computed
at b_prior.  The standard Newton-Raphson step is then

	b = b_prior + d

however, -ml- may perform telescoped or contracted steps depending on which
yields a better log likelihood value.  As detailed in Gould, et.al. (2010)
page 15 (paraphrased using the above notation instead of that of the book):

	...
	3.  Calculate a new guess b = b_prior +s*d, where s is a scalar, for
	    instance:
		a.  Start with s = 1.
		b.  If f(b_prior+d) > f(b_prior), try s = 2 ...
		c.  if f(b_prior+d) <= f(b_prior), back up and try s = 0.5 ...

The gradient and Hessian in C_ntrol are computed at b, so that -ml- can
determine if the next iteration is necessary.

Determining convergence solely based on C_ptol is not sufficient since -ml-
could have performed a contracted step; similarly, using only C_vtol is not
sufficient since -ml- could have performed a telescoped step that yielded a
relatively small improvement in the log-likelihood value.

Checking C_nrtol is expensive and unnecessary if neither C_ptol nor C_vtol are
satisfied, so -ml- conditions a check of C_nrtol on a concave Hessian matrix
and the result of at least one of C_ptol or C_vtol being satisfied.

References:

StataCorp. 2009. Mata Reference Manual, Release 11. '[M-5] moptimize()'.
	College Station, TX: StataCorp LP. pp. 591--625.

Gould, W., J. Pitblado, and B. Poi. 2010.  Maximum likelihood estimation with
	Stata, 4th ed. Colleg Station, TX: Stata Press.

--Jeff
[email protected]
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

Prev by Date: Re: st: r(610)
Next by Date: Re: st: Can we use the standard binary choice model?
Previous by thread: st: Project user-written
Next by thread: st: Date: Tue, 25 Jan 2011 17:37:37 +0000
Index(es):
- Date
- Thread