Title | Convergence of ml | |

Author | William Sribney, StataCorp |

There are only rare cases in which
**ml** can converge to an
answer when it shouldn’t.

If you see either of the messages “nonconcave function
encountered” or “unproductive step attempted” from
**ml** *at the last iteration*, then you should be suspicious of the
results.

There is *no* cause for concern if either of these messages appear
anywhere *before* the last iteration.
For some likelihoods, these messages are the norm for early
iterations, but they typically disappear for the final 4 or 5 iterations.

When the message “nonconcave function encountered” appears at
the *last* iteration, there will also be one or more missing standard
errors, so it’s clear that something is not right in this case. And
in this case, the problem is usually not due to **ml**, but it is a
problem inherent in the model (see the discussion below).

Thus only the message “unproductive step attempted” at the
*last* iteration can appear with answers that look fine but are not.

Stata’s **ml** command uses a modified Newton–Raphson algorithm.
Classical Newton–Raphson bases its steps on the Taylor-series approximation

f'(b) = f'(b0) + H(b0) * (b - b0)

where **f()** is the log-likelihood, **b** is the parameter vector,
**f'()** is the vector of first derivatives, and **H()** is the matrix
of second derivatives (the Hessian).

Since **f'(b) = 0** at the maximum of **f()**, Newton–Raphson chooses
**b** based on **b0** such that

0 = f'(b0) + H(b0) * (b - b0)

Solving for **b** gives

b = b0 - H(b0)^{-1}* f'(b0)

and this process is repeated until convergence is achieved.

There are 3 situations that can cause this algorithm to converge to an answer that is not the unique global maximum. They are

- The existence of multiple local maxima.
- Nonconcave log-likelihoods.
- Numerical problems with the computation of the log-likelihood.

I“ll discuss these problems one at a time.

If your log-likelihood function has multiple local maxima, then the modified Newton–Raphson algorithm should converge to one of them. Unless you can prove mathematically that your log-likelihood function is globally concave, you will never know whether you have converged at a global maximum.

If you cannot analytically prove global concavity, there’s little you can do in practice to show that you have converged to a unique global maximum.

About the only practical steps you can take are

- a grid search through sensible ranges of the parameters — obviously not very feasible with lots of parameters but reasonable for 4 parameters or less;
- try different starting values to see if you converge to different answers. If you do get different answers with different starting values, at least you will know that there are multiple local maxima.

Since (1) is usually impractical and (2) usually lands you back at the same answer, basically one hopes there is only one maximum when you can’t prove it analytically. This is a reasonable assumption — a statistically well-posed likelihood should not have multiple local maxima.

Sometimes there are maxima at nonsensical values of ancillary parameters. For example, for a variance parameter sigma, there may be one maximum for sigma > 0 and another for sigma < 0, with the latter being nonsensical. To avoid this, transform the parameter so it is always >0.

A problem with the classical Newton–Raphson algorithm arises when
**H(b0)** is not invertible. That's where the "modified" part of the
"modified Newton–Raphson algorithm" comes in. If **H(b0)** is not
invertible, the matrix is fiddled with until it is changed into something
that is invertible, and using this inverse, one hopes that one steps in
a direction that is an improvement.

When one sees the message “nonconcave function encountered” in Stata’s iteration log

Iteration 0: Log Likelihood = -18072.001 (nonconcave function encountered) Iteration 1: Log Likelihood = -4064.9319 (nonconcave function encountered) Iteration 2: Log Likelihood = -1373.5496 (nonconcave function encountered)

this is what is happening. **H** is not invertible, **ml** is doing
its "modified" fix-up, and it is taking steps that result in a bigger
log-likelihood.

The "nonconcave function encountered" message is *not* a concern when
it occurs in iterations before the last one:

Iteration 38: Log Likelihood = -205.20113 (nonconcave function encountered) Iteration 39: Log Likelihood = -196.88076 Iteration 40: Log Likelihood = -194.27732 Iteration 41: Log Likelihood = -194.18311 Iteration 42: Log Likelihood = -194.18306 Iteration 43: Log Likelihood = -194.18306

In the above example, **ml** eventually got to a concave region of the
function (after 39 iterations) and then quickly found the maximum (in 4
iterations). This is an example of the modified Newton–Raphson algorithm
working perfectly.

The bottom line here is the “snonconcave function
encountered” message is *only* a concern if it appears at the
very last step. (Note the message applies to the iteration number
*below* the message.)

Here’s an example of when it *does* indicate a problem:

... Iteration 105: Log Likelihood = -195.38874 (nonconcave function encountered) Iteration 106: Log Likelihood = -195.38874 Interval Regression Number of obs = 74 Model chi2(2) = 78.01 Log Likelihood = -195.38874 Prob > chi2 = 0.0000 ------------------------------------------------------------------------------ | Coef. Std. Err. z P>|z| [95% Conf. Interval] ---------+-------------------------------------------------------------------- weight | -.0030025 .0005111 -5.874 0.000 -.0040043 -.0020007 weight | -.0030025 . . . . . _cons | 39.42903 1.59296 24.752 0.000 36.30689 42.55118 ------------------------------------------------------------------------------ _sigma | 3.394052 .2792304 12.155 0.000 2.846771 3.941334 ------------------------------------------------------------------------------

When you converge at a point where the Hessian is singular, one or more standard errors will be missing. The standard errors are derived from the inverse of the Hessian, and, since the Hessian was not invertible, a generalized inverse (produced by dropping one or more rows/columns) was used instead for the variance–covariance matrix.

Thus this situation is pretty easy to identify. It clearly doesn't look like a sound answer.

Note, however, that in the above example (of perfect collinearity) the algorithm did fine. It found a global maximum, only that maximum isn't unique; it is on a ridge. The algorithm worked fine in this case, so we shouldn't attribute the problem to the algorithm; it was a problem with the model.

Besides collinearity, “nonconcave function encountered” at the
last step can happen when the true maximum of the likelihood is at infinity
for a parameter. In these cases, **ml** may declare convergence at a
value of a parameter that is effectively infinity. The likelihood is flat
at infinity, so again the Hessian is not invertible. In this case,
**ml** does find the maximum (to the specified tolerance).

Numeric problems in the computation of the likelihood (like scale problems) can also give rise to "nonconcave function encountered" at the last step.

So really, **ml** and the modified Newton–Raphson algorithm do pretty
good when the Hessian is not invertible. The problem in these cases is
usually inherent in the model (e.g., collinearity or convergence at
infinity) or in a numeric problem in the computation of the likelihood.

The message “unproductive step attempted” appears from **ml**
when the modified Newton–Raphson algorithm takes a step to a new place that
is worse (lower likelihood) than the last place. The Hessian is invertible,
but it steps to a place that is not an improvement.

This message is typically seen at immediate steps for likelihoods with narrow ridges and gently sloping backbones. It is easy to slip off the backbone of the ridge to a lower spot as you try to move along the backbone of the ridge to the maximum.

The algorithm is stepping in a good direction but is stepping too far. The solution is simply to back up a little. This is what is happens when you see the message “unproductive step attempted”. It takes a step that is too optimistic, so it backs up until it finds a smaller step that leads to a greater value of the likelihood.

Here’s a typical example (linear regression as an MLE) of this message
when it’s *not* a problem:

Iteration 0: Log Likelihood = -18072.001 (nonconcave function encountered) Iteration 1: Log Likelihood = -4064.9319 ... (nonconcave function encountered) Iteration 14: Log Likelihood = -217.50138 (unproductive step attempted) Iteration 15: Log Likelihood = -216.84126 ... (unproductive step attempted) Iteration 41: Log Likelihood = -197.68186 Iteration 42: Log Likelihood = -194.35966 Iteration 43: Log Likelihood = -194.18325 Iteration 44: Log Likelihood = -194.18306 Iteration 45: Log Likelihood = -194.18306 <table of estimates appear>

Iterations 1-14 are “nonconcave function encountered”, and iterations 15-41 are “unproductive step attempted”. Iterations 42-45 are “productive” steps (i.e., initial step tried was an improvement) with invertible Hessians. When you are near the answer, you expect Newton–Raphson steps to work out fine.

When there *is* a problem, you’ll see something like

Iteration 0: Log Likelihood = -18072.001 (nonconcave function encountered) Iteration 1: Log Likelihood = -4064.9319 ... (nonconcave function encountered) Iteration 14: Log Likelihood = -217.50138 (unproductive step attempted) Iteration 15: Log Likelihood = -216.84126 ... (unproductive step attempted) Iteration 41: Log Likelihood = -197.68186 (unproductive step attempted) Iteration 42: Log Likelihood = -194.35966 (unproductive step attempted) Iteration 43: Log Likelihood = -194.35966 <table of estimates appear>

At iteration 43, it backed up. The question is, how far did it back up? Did it back up to a point that is (numerically) the same as the point at iteration 42? It may have. It may have backed up to the same point and then declared convergence.

Thus you cannot trust the estimates when there is an “unproductive
step attempted” at the *last* iteration. (Again, if it appears
at any iteration before the last iteration, there is no problem.)

This is a flaw in Stata’s **ml** routine. It should not back up to
the same point and declare convergence. It should print out a message like
“convergence not achieved”, or it should try to do something
intelligent like randomly search in the vicinity for a step that is an
improvement. For the next release of Stata, we will fix this.

Note this is a case where the modified Newton–Raphson algorithm truly breaks down. What is happening is the Taylor-series approximation

f'(b) = f'(b0) + H(b0) * (b - b0)

is no good — even for **b** very close to **b0!** Theoretically,
this implies that the function is not analytic at **b0**, but this is
usually not the case. Rather, what usually is happening is that **f()**,
**f'()**, or **H()** are being computed poorly in the vicinity of
**b0** because of numeric problems in the likelihood computation
routines.

So this is truly the one case in which the modified Newton–Raphson algorithm breaks down: when the functions being computed are not numerically smooth.

If you read this far, you must be interested in the details of **ml**!

If you want to learn all sorts of details about **ml**, I highly
recommend Stata’s NetCourse 200 which is devoted to **ml**
programming. The flavor of the course is, in part, similar to what
I’ve written above (since I'm one of the course instructors!). The
majority of the course covers practical Stata programming issues, but we
discuss theoretical ML and general numerical issues as well.