|
Note: This FAQ is for users of releases prior to Stata 6.
It is not relevant for more recent versions.
Can Stata’s ml routine converge and produce answers that look
good even when it shouldn't?
|
Title
|
|
Convergence of ml
|
|
Author
|
William Sribney, StataCorp
|
|
Date
|
August 1997
|
Short answer
There are only rare cases in which
ml can converge to an
answer when it shouldn’t.
If you see either of the messages “nonconcave function
encountered” or “unproductive step attempted” from
ml at the last iteration, then you should be suspicious of the
results.
There is no cause for concern if either of these messages appear
anywhere before the last iteration.
For some likelihoods, these messages are the norm for early
iterations, but they typically disappear for the final 4 or 5 iterations.
When the message “nonconcave function encountered” appears at
the last iteration, there will also be one or more missing standard
errors, so it’s clear that something is not right in this case. And
in this case, the problem is usually not due to ml, but it is a
problem inherent in the model (see the discussion below).
Thus only the message “unproductive step attempted” at the
last iteration can appear with answers that look fine but are not.
Long answer
Stata’s ml command uses a modified Newton-Raphson algorithm.
Classical Newton-Raphson bases its steps on the Taylor-series approximation
f'(b) = f'(b0) + H(b0) * (b - b0)
where f() is the log-likelihood, b is the parameter vector,
f'() is the vector of first derivatives, and H() is the matrix
of second derivatives (the Hessian).
Since f'(b) = 0 at the maximum of f(), Newton-Raphson chooses
b based on b0 such that
0 = f'(b0) + H(b0) * (b - b0)
Solving for b gives
b = b0 - H(b0)-1 * f'(b0)
and this process is repeated until convergence is achieved.
There are 3 situations that can cause this algorithm to converge to an answer
that is not the unique global maximum. They are
- The existence of multiple local maxima.
- Nonconcave log-likelihoods.
- Numerical problems with the computation of the log-likelihood.
I“ll discuss these problems one at a time.
Problem 1: The existence of multiple local maxima
If your log-likelihood function has multiple local maxima, then the modified
Newton-Raphson algorithm should converge to one of them. Unless you can
prove mathematically that your log-likelihood function is globally concave,
you will never know whether you have converged at a global maximum.
If you cannot analytically prove global concavity, there’s little you
can do in practice to show that you have converged to a unique global
maximum.
About the only practical steps you can take are
- a grid search through sensible ranges of the parameters —
obviously not very feasible with lots of parameters but reasonable for 4
parameters or less;
- try different starting values to see if you converge to different
answers. If you do get different answers with different starting
values, at least you will know that there are multiple local maxima.
Since (1) is usually impractical and (2) usually lands you back at the same
answer, basically one hopes there is only one maximum when you
can’t prove it analytically. This is a reasonable assumption — a
statistically well-posed likelihood should not have multiple local maxima.
Sometimes there are maxima at nonsensical values of ancillary parameters.
For example, for a variance parameter sigma, there may be one maximum for
sigma > 0 and another for sigma < 0, with the latter being
nonsensical. To avoid this, transform the parameter so it is always
>0.
Problem 2: Nonconcave log-likelihoods
A problem with the classical Newton-Raphson algorithm arises when
H(b0) is not invertible. That's where the "modified" part of the
"modified Newton-Raphson algorithm" comes in. If H(b0) is not
invertible, the matrix is fiddled with until it is changed into something
that is invertible, and using this inverse, one hopes that one steps in
a direction that is an improvement.
When one sees the message “nonconcave function encountered” in
Stata’s iteration log
Iteration 0: Log Likelihood = -18072.001
(nonconcave function encountered)
Iteration 1: Log Likelihood = -4064.9319
(nonconcave function encountered)
Iteration 2: Log Likelihood = -1373.5496
(nonconcave function encountered)
this is what is happening. H is not invertible, ml is doing
its "modified" fix-up, and it is taking steps that result in a bigger
log-likelihood.
The "nonconcave function encountered" message is not a concern when
it occurs in iterations before the last one:
Iteration 38: Log Likelihood = -205.20113
(nonconcave function encountered)
Iteration 39: Log Likelihood = -196.88076
Iteration 40: Log Likelihood = -194.27732
Iteration 41: Log Likelihood = -194.18311
Iteration 42: Log Likelihood = -194.18306
Iteration 43: Log Likelihood = -194.18306
In the above example, ml eventually got to a concave region of the
function (after 39 iterations) and then quickly found the maximum (in 4
iterations). This is an example of the modified Newton-Raphson algorithm
working perfectly.
The bottom line here is the “snonconcave function
encountered” message is only a concern if it appears at the
very last step. (Note the message applies to the iteration number
below the message.)
Here’s an example of when it does indicate a problem:
...
Iteration 105: Log Likelihood = -195.38874
(nonconcave function encountered)
Iteration 106: Log Likelihood = -195.38874
Interval Regression Number of obs = 74
Model chi2(2) = 78.01
Log Likelihood = -195.38874 Prob > chi2 = 0.0000
------------------------------------------------------------------------------
| Coef. Std. Err. z P>|z| [95% Conf. Interval]
---------+--------------------------------------------------------------------
weight | -.0030025 .0005111 -5.874 0.000 -.0040043 -.0020007
weight | -.0030025 . . . . .
_cons | 39.42903 1.59296 24.752 0.000 36.30689 42.55118
------------------------------------------------------------------------------
_sigma | 3.394052 .2792304 12.155 0.000 2.846771 3.941334
------------------------------------------------------------------------------
When you converge at a point where the Hessian is singular, one or more
standard errors will be missing. The standard errors are derived from the
inverse of the Hessian, and, since the Hessian was not invertible, a
generalized inverse (produced by dropping one or more rows/columns) was used
instead for the variance–covariance matrix.
Thus this situation is pretty easy to identify. It clearly doesn't look
like a sound answer.
Note, however, that in the above example (of perfect collinearity) the
algorithm did fine. It found a global maximum, only that maximum isn't
unique; it is on a ridge. The algorithm worked fine in this case, so we
shouldn't attribute the problem to the algorithm; it was a problem with the
model.
Besides collinearity, “nonconcave function encountered” at the
last step can happen when the true maximum of the likelihood is at infinity
for a parameter. In these cases, ml may declare convergence at a
value of a parameter that is effectively infinity. The likelihood is flat
at infinity, so again the Hessian is not invertible. In this case,
ml does find the maximum (to the specified tolerance).
Numeric problems in the computation of the likelihood (like scale problems)
can also give rise to "nonconcave function encountered" at the last step.
So really, ml and the modified Newton-Raphson algorithm do pretty
good when the Hessian is not invertible. The problem in these cases is
usually inherent in the model (e.g., collinearity or convergence at
infinity) or in a numeric problem in the computation of the likelihood.
3. Unproductive step attempted
The message “unproductive step attempted” appears from ml
when the modified Newton-Raphson algorithm takes a step to a new place that
is worse (lower likelihood) than the last place. The Hessian is invertible,
but it steps to a place that is not an improvement.
This message is typically seen at immediate steps for likelihoods with
narrow ridges and gently sloping backbones. It is easy to slip off the
backbone of the ridge to a lower spot as you try to move along the backbone
of the ridge to the maximum.
The algorithm is stepping in a good direction but is stepping too far. The
solution is simply to back up a little. This is what is happens when you
see the message “unproductive step attempted”. It takes a step
that is too optimistic, so it backs up until it finds a smaller step that
leads to a greater value of the likelihood.
Here’s a typical example (linear regression as an MLE) of this message
when it’s not a problem:
Iteration 0: Log Likelihood = -18072.001
(nonconcave function encountered)
Iteration 1: Log Likelihood = -4064.9319
...
(nonconcave function encountered)
Iteration 14: Log Likelihood = -217.50138
(unproductive step attempted)
Iteration 15: Log Likelihood = -216.84126
...
(unproductive step attempted)
Iteration 41: Log Likelihood = -197.68186
Iteration 42: Log Likelihood = -194.35966
Iteration 43: Log Likelihood = -194.18325
Iteration 44: Log Likelihood = -194.18306
Iteration 45: Log Likelihood = -194.18306
<table of estimates appear>
Iterations 1-14 are “nonconcave function encountered”, and
iterations 15-41 are “unproductive step attempted”. Iterations
42-45 are “productive” steps (i.e., initial step tried was an
improvement) with invertible Hessians. When you are near the answer, you
expect Newton-Raphson steps to work out fine.
When there is a problem, you’ll see something like
Iteration 0: Log Likelihood = -18072.001
(nonconcave function encountered)
Iteration 1: Log Likelihood = -4064.9319
...
(nonconcave function encountered)
Iteration 14: Log Likelihood = -217.50138
(unproductive step attempted)
Iteration 15: Log Likelihood = -216.84126
...
(unproductive step attempted)
Iteration 41: Log Likelihood = -197.68186
(unproductive step attempted)
Iteration 42: Log Likelihood = -194.35966
(unproductive step attempted)
Iteration 43: Log Likelihood = -194.35966
<table of estimates appear>
At iteration 43, it backed up. The question is, how far did it back up?
Did it back up to a point that is (numerically) the same as the point at
iteration 42? It may have. It may have backed up to the same point and
then declared convergence.
Thus you cannot trust the estimates when there is an “unproductive
step attempted” at the last iteration. (Again, if it appears
at any iteration before the last iteration, there is no problem.)
This is a flaw in Stata’s ml routine. It should not back up to
the same point and declare convergence. It should print out a message like
“convergence not achieved”, or it should try to do something
intelligent like randomly search in the vicinity for a step that is an
improvement. For the next release of Stata, we will fix this.
Note this is a case where the modified Newton–Raphson algorithm
truly breaks down. What is happening is the Taylor-series
approximation
f'(b) = f'(b0) + H(b0) * (b - b0)
is no good — even for b very close to b0! Theoretically,
this implies that the function is not analytic at b0, but this is
usually not the case. Rather, what usually is happening is that f(),
f'(), or H() are being computed poorly in the vicinity of
b0 because of numeric problems in the likelihood computation
routines.
So this is truly the one case in which the modified Newton–Raphson
algorithm breaks down: when the functions being computed are not numerically
smooth.
If you read this far ...
If you read this far, you must be interested in the details of ml!
If you want to learn all sorts of details about ml, I highly
recommend Stata’s NetCourse 200 which is devoted to ml
programming. The flavor of the course is, in part, similar to what
I’ve written above (since I'm one of the course instructors!). The
majority of the course covers practical Stata programming issues, but we
discuss theoretical ML and general numerical issues as well.
|