Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: Is it possible that Stata converges to a local maximum in maximum likelihood related procedures?

From   Gordon Hughes <>
Subject   Re: st: Is it possible that Stata converges to a local maximum in maximum likelihood related procedures?
Date   Mon, 10 Jun 2013 12:33:27 +0100

No gradient search can guarantee that it has found a global maximum unless either (a) the objective function is globally concave, or (b) you have carried some kind of extensive grid search on starting values. Some objective functions are known to be globally concave - e.g. quadratic functions (least squares) or the logit model - but many may not be. The practical problem is that many likelihood functions are degenerate for some values of the parameters, so that a grid search over starting values may generate large numbers of failures.

As Nick points out, Stata's maximisation procedures (including -ml-) contain many safeguards both to avoid pathological results and to reduce the chances of converging to a local rather than a global maximum, but both can still occur. If you are worried about this in a particular case, it is usually sensible to start from a restricted version of the model which is known to be globally concave. That way it is likely, though not certain, that a gradient search which starts from the global maximum of a restricted model will head in the right direction when dealing with a less restricted version of the model.

Most Stata -ml- procedures adopt this strategy as it is much better than, say, starting with a vector of 0's. But you should always take account of the specific features of the likelihood function to improve the chances of finding a global maximum in the most general case. Partitioning the set of parameters and using a concentrated likelihood function - i.e. multi-step estimation where some parameters are estimated conditional on prior values of other parameters - is classic example of that approach.

Gordon Hughes
*   For searches and help try:

© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index