Notice: On March 31, it was **announced** that Statalist is moving from an email list to a **forum**. The old list will shut down on April 23, and its replacement, **statalist.org** is already up and running.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

From |
Stas Kolenikov <skolenik@gmail.com> |

To |
statalist@hsphsun2.harvard.edu |

Subject |
Re: st: numerical derivatives and -ml- command |

Date |
Tue, 6 Nov 2012 08:33:51 -0600 |

On top of what Maarten and Nick said, there are things that mortal programmers will not have access to and control of. Among these are: 1. Parallelization: if you have Stata MP#, Stata ml code (not at ado or Mata level, but at C level) may be smart enough to send different parameter combinations to different processors, and/or combine the individual level contributions into the resulting gradients and Hessians much faster. 2. Overhead: given how flexible -ml- is, a great chunk of its time is being spent rearranging the general constructs in memory. So out of 47ms of your single evaluation, you may have 24ms of actual data crunching, and 23 ms putting the results back into -ml- framework, displaying the results, if any (that's very slow), looking for the next value, if needed, and clearing the memory before exiting. When you ask Stata to evaluate the Hessian, it would take (crunching time) times (# of evaluations) + (overhead time), where the latter remains more or less fixed, but will now be a much smaller fraction of the overall time. 3. I am pretty sure that -ml- is much smarter in terms of approximating the Hessian, and takes an optimal combination of points in the parameter space that is specific to evaluating it. In other words, it may take 4 evaluations to get the (1,1) element, but (1,2) may use only 3 and recycle one of the previously used ones; and by the time you get down to (k,k), you may need only 1 or 0 additional trial parameter values. There are also things that count against you in timing that Sun Yutao has not accounted for. The biggest one is determining the step size that may involve another 3-5-10 evaluations. As a bottom line, I second (or rather third) Maarten's and Nick's suggestions to forget reverse-engineering -ml- and just treat it as a black box, concentrating on what you can do on your end to make things faster. If you are implementing a fixed effect estimator as a bunch of dummy variables, the analogy only works for linear models (-xtreg-), and is extremely dangerous elsewhere. The sufficient statistics that allow conditioning for fixed effects estimator are also available for the binary data via -xtlogit-, and for count data via -xtpoisson-. You can put as many dummies as Stata would tolerate in other models, but that won't give you fixed effects estimation, just the "lots-of-dummy-variables" estimation. That has been my thinking so far, but if I am mistaken, I would like to hear about other models for which fixed effects are an option. -- -- Stas Kolenikov, PhD, PStat (SSC) :: http://stas.kolenikov.name -- Senior Survey Statistician, Abt SRBI :: work email kolenikovs at srbi dot com -- Opinions stated in this email are mine only, and do not reflect the position of my employer On Mon, Nov 5, 2012 at 3:26 PM, Sun Yutao <yutao.sun.statalist@outlook.com> wrote: > Hello, > > I'm trying to write an maximum likelihood algorithm that can handle a > particularly large number of fixed effects (which is not possible with Stata's > #var and matsize limitation). I have been comparing the performance of my ml > with the Stata -ml- these days and I find something really strange. More > specifically, does anyone know how the Newton-Raphson algorithm in Stata -ml- > command works in detail? Because what I see from a small experiment is that > the -ml- command is just way too fast so it's not possible to iteratively > update the Hessian by numerical differentiation. > > Here the story goes: > > For an original Newton-Raphson(nr), one needs to update the gradients and > Hessian in each iteration. And suppose we have a likelihood function(llf), 5 > variables, and a given number of observations. And a single evaluation of the > llf takes 0.0475 seconds, so for a 2-point approximation of the gradient > vector one need 5*2=10 evaluations, which gives roughly 0.475 seconds. And the > Hessian is a bit complicated: every off-diagonal element will need 4 llf > evaluations and you have 4+3+2+1=10 of them, while the diagonal elements need > 2 evaluations each, plus 1 for the common f(x), which gives 5*2+1 llf > evaluations and hence for the Hessian one needs 10*4+5*2+1=51 llf evaluations > which gives 2.4225 seconds. > > Based on that, one iterations in this setting should roughly take 3 seconds. > However, the Stata -ml- command only needs 1 second per iteration, which is > particularly uncommon. And even if my calculations for the Hessian was wrong, > I'm still convinced that the Hessian needs more llf evaluations and hence more > time to compute than the gradients. So it cannot possibly be 1 second per > iteration. And in fact, in terms of performances, the Stata nr behaves a > little bit like a quasi-Newton + a line search... Or is It possible that Stata > secretly have a feature that can take analytical derivatives on a > user-specified function? > > Does anyone know the answer to this problem? Or if there just is a better way > to compute numerical derivatives that I do not know? > > Best regards, > Sun Yutao > > > > * > * For searches and help try: > * http://www.stata.com/help.cgi?search > * http://www.stata.com/support/faqs/resources/statalist-faq/ > * http://www.ats.ucla.edu/stat/stata/ * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/faqs/resources/statalist-faq/ * http://www.ats.ucla.edu/stat/stata/

**References**:**st: numerical derivatives and -ml- command***From:*Sun Yutao <yutao.sun.statalist@outlook.com>

- Prev by Date:
**Re: st: ANOVA repeated measures** - Next by Date:
**Re: st: label to lower case** - Previous by thread:
**Re: st: numerical derivatives and -ml- command** - Next by thread:
**st: SSC Archive activity, October 2012** - Index(es):