Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

From |
jpitblado@stata.com (Jeff Pitblado, StataCorp LP) |

To |
statalist@hsphsun2.harvard.edu |

Subject |
Re: st: [ml tolerance (originally)] |

Date |
Tue, 25 Jan 2011 08:54:26 -0600 |

Christian Gregory <cgregory@ers.usda.gov> had two follow-up questions regarding the convergence tolerances used by -ml-: > 1. Jeff, when you say relative difference for, say, ptol, is this (p for > this iteration - p for the last iteration) - ptolerance? > > 2. Can you say something about the relevance of the g*(-inv(H))g' > criterion? Does it do something other than make sure the 1st and second > derivatives are zero? Klauss Pforr <klaus.pforr@mzes.uni-mannheim.de> pointed Christian to a section of the -moptimize()- help file that answers the first question. The specific information is: Let b = full set of coefficients b_prior = value of b from prior iteration then define C_ptol: mreldif(b, b_prior) <= ptol As for C_nrtol: g*invsym(-H)*g' < nrtol this criterion checks that the Hessian scaled gradient values are sufficiently close to zero. C_ntrol and C_ptol are similar in spirit, they both check on some measure of change in the coefficient values between iterations. You may be asking: So where does C_nrtol come from? Why do we need C_nrtol? b_prior is a rowvector, so the update vector is d = g_prior*invsym(-H_prior) where g_prior and H_prior are the gradient vector and Hessian matrix computed at b_prior. The standard Newton-Raphson step is then b = b_prior + d however, -ml- may perform telescoped or contracted steps depending on which yields a better log likelihood value. As detailed in Gould, et.al. (2010) page 15 (paraphrased using the above notation instead of that of the book): ... 3. Calculate a new guess b = b_prior +s*d, where s is a scalar, for instance: a. Start with s = 1. b. If f(b_prior+d) > f(b_prior), try s = 2 ... c. if f(b_prior+d) <= f(b_prior), back up and try s = 0.5 ... The gradient and Hessian in C_ntrol are computed at b, so that -ml- can determine if the next iteration is necessary. Determining convergence solely based on C_ptol is not sufficient since -ml- could have performed a contracted step; similarly, using only C_vtol is not sufficient since -ml- could have performed a telescoped step that yielded a relatively small improvement in the log-likelihood value. Checking C_nrtol is expensive and unnecessary if neither C_ptol nor C_vtol are satisfied, so -ml- conditions a check of C_nrtol on a concave Hessian matrix and the result of at least one of C_ptol or C_vtol being satisfied. References: StataCorp. 2009. Mata Reference Manual, Release 11. '[M-5] moptimize()'. College Station, TX: StataCorp LP. pp. 591--625. Gould, W., J. Pitblado, and B. Poi. 2010. Maximum likelihood estimation with Stata, 4th ed. Colleg Station, TX: Stata Press. --Jeff jpitblado@stata.com * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

- Prev by Date:
**Re: st: r(610)** - Next by Date:
**Re: st: Can we use the standard binary choice model?** - Previous by thread:
**st: Project user-written** - Next by thread:
**st: Date: Tue, 25 Jan 2011 17:37:37 +0000** - Index(es):