**[R] maximize** -- Details of iterative maximization

__Syntax__

Maximum likelihood optimization

*mle_cmd* *...* [**,** *options*]

Set default maximum iterations

**set** **maxiter** *#* [**,** __perm__**anently**]

*options* Description
-------------------------------------------------------------------------
__dif__**ficult** use a different stepping algorithm in
nonconcave regions
__tech__**nique(***algorithm_spec***)** maximization technique
__iter__**ate(***#***)** perform maximum of *#* iterations; default is
**iterate(16000)**
[__no__]__lo__**g** display an iteration log of the log
likelihood; typically, the default
__tr__**ace** display current parameter vector in
iteration log
__grad__**ient** display current gradient vector in iteration
log
**showstep** report steps within an iteration in
iteration log
__hess__**ian** display current negative Hessian matrix in
iteration log
__showtol__**erance** report the calculated result that is
compared to the effective convergence
criterion
__tol__**erance(***#***)** tolerance for the coefficient vector; see
*Options* for the defaults
__ltol__**erance(***#***)** tolerance for the log likelihood; see
*Options* for the defaults
__nrtol__**erance(***#***)** tolerance for the scaled gradient; see
*Options* for the defaults
__qtol__**erance(***#***)** when specified with algorithms **bhhh**, **dfp**, or
**bfgs**, the q-H matrix is used as the final
check for convergence rather than
**nrtolerance()** and the H matrix; seldom
used
__nonrtol__**erance** ignore the **nrtolerance()** option
**from(***init_specs***)** initial values for the coefficients
-------------------------------------------------------------------------
where *algorithm_spec* is

*algorithm* [ *#* [ *algorithm* [*#*] ] ... ]

*algorithm* is {**nr** | **bhhh** | **dfp** | **bfgs**}

and *init_specs* is one of

*matname* [**,** **skip** **copy** ]

{ [*eqname***:**]*name* **=** *#* | **/***eqname* **=** *#* } [*...*]

*#* [*#* *...*]**,** **copy**

__Description__

All Stata commands maximize likelihood functions using **moptimize()** and
**optimize()**; see *Methods and formulas* in **[R] maximize**. Commands use the
Newton-Raphson method with step halving and special fixups when they
encounter nonconcave regions of the likelihood. For details, see **[M-5]**
**moptimize** and **[M-5] optimize**. For more information about programming
maximum likelihood estimators in ado-files and Mata, see **[R] ml** and
Gould, Pitblado, and Poi (2010).

**set** **maxiter** specifies the default maximum number of iterations for
estimation commands that iterate. The initial value is **16000**, and *#* can
be **0** to **16000**. To change the maximum number of iterations performed by a
particular estimation command, you need not reset **maxiter**; you can
specify the **iterate(***#***)** option. When **iterate(***#***)** is not specified, the
**maxiter** value is used.

__Maximization options__

**difficult** specifies that the likelihood function is likely to be
difficult to maximize because of nonconcave regions. When the
message "not concave" appears repeatedly, **ml**'s standard stepping
algorithm may not be working well. **difficult** specifies that a
different stepping algorithm be used in nonconcave regions. There is
no guarantee that **difficult** will work better than the default;
sometimes it is better and sometimes it is worse. You should use the
**difficult** option only when the default stepper declares convergence
and the last iteration is "not concave" or when the default stepper
is repeatedly issuing "not concave" messages and producing only tiny
improvements in the log likelihood.

**technique(***algorithm_spec***)** specifies how the likelihood function is to be
maximized. The following algorithms are allowed. For details, see
Gould, Pitblado, and Poi (2010).

**technique(nr)** specifies Stata's modified Newton-Raphson (NR)
algorithm.

**technique(bhhh)** specifies the Berndt-Hall-Hall-Hausman (BHHH)
algorithm.

**technique(dfp)** specifies the Davidon-Fletcher-Powell (DFP) algorithm.

**technique(bfgs)** specifies the Broyden-Fletcher-Goldfarb-Shanno (BFGS)
algorithm.

The default is **technique(nr)**.

You can switch between algorithms by specifying more than one in the
**technique()** option. By default, an algorithm is used for five
iterations before switching to the next algorithm. To specify a
different number of iterations, include the number after the
technique in the option. For example, specifying **technique(bhhh 10**
**nr 1000)** requests that **ml** perform 10 iterations with the BHHH
algorithm followed by 1000 iterations with the NR algorithm, and then
switch back to BHHH for 10 iterations, and so on. The process
continues until convergence or until the maximum number of iterations
is reached.

**iterate(***#***)** specifies the maximum number of iterations. When the number
of iterations equals **iterate()**, the optimizer stops and presents the
current results. If convergence is declared before this threshold is
reached, it will stop when convergence is declared. Specifying
**iterate(0)** is useful for viewing results evaluated at the initial
value of the coefficient vector. Specifying **iterate(0)** and **from()**
together allows you to view results evaluated at a specified
coefficient vector; however, not all commands allow the **from()**
option. The default value of **iterate(***#***)** for both estimators
programmed internally and estimators programmed with **ml** is the
current value of **set maxiter**, which is **iterate(16000)** by default.

**log** and **nolog** specify whether an iteration log showing the progress of
the log likelihood is to be displayed. For most commands, the log is
displayed by default, and **nolog** suppresses it. For a few commands
(such as the **svy** maximum likelihood estimators), you must specify **log**
to see the log.

**trace** adds to the iteration log a display of the current parameter
vector.

**gradient** adds to the iteration log a display of the current gradient
vector.

**showstep** adds to the iteration log a report on the steps within an
iteration. This option was added so that developers at StataCorp
could view the stepping when they were improving the **ml** optimizer
code. At this point, it mainly provides entertainment.

**hessian** adds to the iteration log a display of the current negative
Hessian matrix.

**showtolerance** adds to the iteration log the calculated value that is
compared with the effective convergence criterion at the end of each
iteration. Until convergence is achieved, the smallest calculated
value is reported.

**shownrtolerance** is a synonym of **showtolerance**.

-------------------------------------------------------------------------------
Below we describe the three convergence tolerances. Convergence is
declared when the **nrtolerance()** criterion is met and either the
**tolerance()** or the **ltolerance()** criterion is also met.

**tolerance(***#***)** specifies the tolerance for the coefficient vector. When
the relative change in the coefficient vector from one iteration to
the next is less than or equal to **tolerance()**, the **tolerance()**
convergence criterion is satisfied.

**tolerance(1e-4)** is the default for estimators programmed with **ml**.

**tolerance(1e-6)** is the default.

**ltolerance(***#***)** specifies the tolerance for the log likelihood. When the
relative change in the log likelihood from one iteration to the next
is less than or equal to **ltolerance()**, the **ltolerance()** convergence
is satisfied.

**ltolerance(0)** is the default for estimators programmed with ml.

**ltolerance(1e-7)** is the default.

**nrtolerance(***#***)** specifies the tolerance for the scaled gradient.
Convergence is declared when g*inv(H)*g' < **nrtolerance()**. The
default is **nrtolerance(1e-5)**.

**qtolerance(***#***)** when specified with algorithms **bhhh**, **dfp**, or **bfgs** uses the
q-H matrix as the final check for convergence rather than
**nrtolerance()** and the H matrix.

Beginning with Stata 12, by default, Stata now computes the H matrix
when the q-H matrix passes the convergence tolerance, and Stata
requires that H be concave and pass the **nrtolerance()** criterion
before concluding convergence has occurred.

**qtolerance()** provides a way for the user to obtain Stata's earlier
behavior.

**nonrtolerance** specifies that the default **nrtolerance()** criterion be
turned off.

-------------------------------------------------------------------------------

**from()** specifies initial values for the coefficients. Not all estimators
in Stata support this option. You can specify the initial values in
one of three ways: by specifying the name of a vector containing the
initial values (for example, **from(b0)**, where **b0** is a properly labeled
vector); by specifying coefficient names with the values (for
example, **from(age=2.1 /sigma=7.4)**); or by specifying a list of values
(for example, **from(2.1 7.4, copy)**). **from()** is intended for use when
doing bootstraps (see **[R] bootstrap**) and in other special situations
(for example, with **iterate(0)**). Even when the values specified in
**from()** are close to the values that maximize the likelihood, only a
few iterations may be saved. Poor values in **from()** may lead to
convergence problems.

**skip** specifies that any parameters found in the specified
initialization vector that are not also found in the model be
ignored. The default action is to issue an error message.

**copy** specifies that the list of values or the initialization vector
be copied into the initial-value vector by position rather than
by name.

__Option for set maxiter__

**permanently** specifies that, in addition to making the change right now,
the **maxiter** setting be remembered and become the default setting when
you invoke Stata.

__Remarks__

Only in rare circumstances would you ever need to specify any of these
options, except **nolog**. The **nolog** option is useful for reducing the
amount of output appearing in log files.

__Stored results__

Maximum likelihood estimators store the following in **e()**:

Scalars
**e(N)** number of observations; always stored
**e(k)** number of parameters; always stored
**e(k_eq)** number of equations in **e(b)**; usually stored
**e(k_eq_model)** number of equations in overall model test; usually
stored
**e(k_dv)** number of dependent variables; usually stored
**e(df_m)** model degrees of freedom; always stored
**e(r2_p)** pseudo-R-squared; sometimes stored
**e(ll)** log likelihood; always stored
**e(ll_0)** log likelihood, constant-only model; stored when
constant-only model is fit
**e(N_clust)** number of clusters; stored when **vce(cluster**
*clustvar***)** is specified; see **[U] 20.22 Obtaining**
**robust variance estimates**
**e(chi2)** chi-squared; usually stored
**e(p)** p-value for model test; usually stored
**e(rank)** rank of **e(V)**; always stored
**e(rank0)** rank of **e(V)** for constant-only model; stored when
constant-only model is fit
**e(ic)** number of iterations; usually stored
**e(rc)** return code; usually stored
**e(converged)** **1** if converged, **0** otherwise; usually stored

Macros
**e(cmd)** name of command; always stored
**e(cmdline)** command as typed; always stored
**e(depvar)** names of dependent variables; always stored
**e(wtype)** weight type; stored when weights are specified or
implied
**e(wexp)** weight expression; stored when weights are
specified or implied
**e(title)** title in estimation output; usually stored by
commands using **ml**
**e(clustvar)** name of cluster variable; stored when **vce(cluster**
*clustvar***)** is specified; see **[U] 20.22 Obtaining**
**robust variance estimates**
**e(chi2type)** **Wald** or **LR**; type of model chi-squared test; usually
stored
**e(vce)** *vcetype* specified in **vce()**; stored when command
allows **vce()**
**e(vcetype)** title used to label Std. Err.; sometimes stored
**e(opt)** type of optimization; always stored
**e(which)** **max** or **min**; whether optimizer is to perform
maximization or minimization; always stored
**e(ml_method)** type of **ml** method; always stored by commands using
**ml**
**e(user)** name of likelihood-evaluator program; always stored
**e(technique)** from **technique()** option; sometimes stored
**e(singularHmethod)** **m-marquardt** or **hybrid**; method used when Hessian is
singular; sometimes stored (1)
**e(crittype)** optimization criterion; always stored (1)
**e(properties)** estimator properties; always stored
**e(predict)** program used to implement **predict**; usually stored

Matrices
**e(b)** coefficient vector; always stored
**e(Cns)** constraints matrix; sometimes stored
**e(ilog)** iteration log (up to 20 iterations); usually stored
**e(gradient)** gradient vector; usually stored
**e(V)** variance-covariance matrix of the estimators;
always stored
**e(V_modelbased)** model-based variance; only stored when **e(V)** is
robust, cluster-robust, bootstrap, or jackknife
variance

Functions
**e(sample)** marks estimation sample; always stored
--------------------
1. Type **ereturn** **list,** **all** to view these results; see **[P] return**.

See *Stored results* in the manual entry for any maximum likelihood
estimator for a list of returned results.

__Reference__

Gould, W. W., J. Pitblado, and B. P. Poi. 2010. *Maximum Likelihood*
*Estimation with Stata.* 4th ed. College Station, TX: Stata Press.