Bookmark and Share

Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down on April 23, and its replacement, is already up and running.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: numerical derivatives and -ml- command

From   Stas Kolenikov <>
Subject   Re: st: numerical derivatives and -ml- command
Date   Tue, 6 Nov 2012 08:33:51 -0600

On top of what Maarten and Nick said, there are things that mortal
programmers will not have access to and control of. Among these are:

1. Parallelization: if you have Stata MP#, Stata ml code (not at ado
or Mata level, but at C level) may be smart enough to send different
parameter combinations to different processors, and/or combine the
individual level contributions into the resulting gradients and
Hessians much faster.
2. Overhead: given how flexible -ml- is, a great chunk of its time is
being spent rearranging the general constructs in memory. So out of
47ms of your single evaluation, you may have 24ms of actual data
crunching, and 23 ms putting the results back into -ml- framework,
displaying the results, if any (that's very slow), looking for the
next value, if needed, and clearing the memory before exiting. When
you ask Stata to evaluate the Hessian, it would take (crunching time)
times (# of evaluations) + (overhead time), where the latter remains
more or less fixed, but will now be a much smaller fraction of the
overall time.
3. I am pretty sure that -ml- is much smarter in terms of
approximating the Hessian, and takes an optimal combination of points
in the parameter space that is specific to evaluating it. In other
words, it may take 4 evaluations to get the (1,1) element, but (1,2)
may use only 3 and recycle one of the previously used ones; and by the
time you get down to (k,k), you may need only 1 or 0 additional trial
parameter values.

There are also things that count against you in timing that Sun Yutao
has not accounted for. The biggest one is determining the step size
that may involve another 3-5-10 evaluations.

As a bottom line, I second (or rather third) Maarten's and Nick's
suggestions to forget reverse-engineering -ml- and just treat it as a
black box, concentrating on what you can do on your end to make things

If you are implementing a fixed effect estimator as a bunch of dummy
variables, the analogy only works for linear models (-xtreg-), and is
extremely dangerous elsewhere. The sufficient statistics that allow
conditioning for fixed effects estimator are also available for the
binary data via -xtlogit-, and for count data via -xtpoisson-. You can
put as many dummies as Stata would tolerate in other models, but that
won't give you fixed effects estimation, just the
"lots-of-dummy-variables" estimation. That has been my thinking so
far, but if I am mistaken, I would like to hear about other models for
which fixed effects are an option.

-- Stas Kolenikov, PhD, PStat (SSC)  ::
-- Senior Survey Statistician, Abt SRBI  ::  work email kolenikovs at
srbi dot com
-- Opinions stated in this email are mine only, and do not reflect the
position of my employer

On Mon, Nov 5, 2012 at 3:26 PM, Sun Yutao
<> wrote:
> Hello,
> I'm trying to write an maximum likelihood algorithm that can handle a
> particularly large number of fixed effects (which is not possible with Stata's
> #var and matsize limitation). I have been comparing the performance of my ml
> with the Stata -ml- these days and I find something really strange. More
> specifically, does anyone know how the Newton-Raphson algorithm in Stata -ml-
> command works in detail? Because what I see from a small experiment is that
> the -ml- command is just way too fast so it's not possible to iteratively
> update the Hessian by numerical differentiation.
> Here the story goes:
> For an original Newton-Raphson(nr), one needs to update the gradients and
> Hessian in each iteration.  And suppose we have a likelihood function(llf), 5
> variables, and a given number of observations. And a single evaluation of the
> llf takes 0.0475 seconds, so for a 2-point approximation of the gradient
> vector one need 5*2=10 evaluations, which gives roughly 0.475 seconds. And the
> Hessian is a bit complicated: every off-diagonal element will need 4 llf
> evaluations and you have  4+3+2+1=10 of them, while the diagonal elements need
> 2 evaluations each, plus 1 for the common f(x), which gives 5*2+1 llf
> evaluations and hence for the Hessian one needs 10*4+5*2+1=51 llf evaluations
> which gives  2.4225 seconds.
> Based on that, one iterations in this setting should roughly take 3 seconds.
> However, the Stata -ml- command only needs 1 second per iteration, which is
> particularly uncommon. And even if my calculations for the Hessian was wrong,
> I'm still convinced that the Hessian needs more llf evaluations and hence more
> time to compute than the gradients. So it cannot possibly be 1 second per
> iteration. And in fact, in terms of performances, the Stata nr behaves a
> little bit like a quasi-Newton + a line search... Or is It possible that Stata
> secretly have a feature that can take analytical derivatives on a
> user-specified function?
> Does anyone know the answer to this problem? Or if there just is a better way
> to compute numerical derivatives that I do not know?
> Best regards,
> Sun Yutao
> *
> *   For searches and help try:
> *
> *
> *
*   For searches and help try:

© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   Site index