[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]
Re: st: [Mata newbie] optimize() vs -ml-

From	[email protected] (Jeff Pitblado, StataCorp LP)
To	[email protected]
Subject	Re: st: [Mata newbie] optimize() vs -ml-
Date	Tue, 16 Oct 2007 16:37:19 -0500
Antoine Terracol <[email protected]> asks about relative speed between
Stata's -ml- command and Mata's -optimize()- suite (new is Stata 10):

> I was wondering if there was any general rule to decide wether a given 
> likelihood maximisation problem will be more efficiently handled using 
> Mata's optimize() or Stata's -ml- command ?
> 
> When I compare the example code given in the help file for optimize() to 
> fit a ml estimation of a beta density (v0 method) whith the equivalent 
> -ml- code (lf method), I find that optimize() runs approximately 3 to 5 
> times faster than -ml-, depending on the size of the (simulated) dataset.
> 
> However, when fitting a simple probit model (two explanatory variables), 
> optimize() (v0 method) runs significantly slower than -ml- (lf method).
> 
> Is it caused by the matrix multiplication (cross()) needed to calculate 
> x*beta when explanatory variables are present, or is my code badly 
> written?
>
> (omitting Antoine's code)

When comparing -ml- and -optimize()- on equal playing fields, -optimize()-
will tend to be faster than -ml-.

The transpose operator in Antoine's -optimize()- evaluator for the probit
model can be expensive (it causes Mata to make a copy) but is not the culprit
here.

Antoine's observations to the contrary are due to the choice of evaluator
types in the comparison.  In fact having predictors in the probit model
accentuates the timing difference between the -v0- and -lf- evaluator types.

1.  Predictors:

For the 'beta density' case Antoine's models do not contain predictors, but
the 'probit' model contains 2 predictors.

If Antoine removes the predictors from the 'probit' models, -optimize()- will
finish the job faster than -ml- for the same constant-only model.

2.  Evaluator types:

The reason for the flip-flop of results is due to how -ml- and -optimize()-
take numerical derivatives given the user specified evaluator type.

	-optimize()- with type -v0- evaluators must take the numerical
	derivative with respect to each element of the parameter vector passed
	to the users evaluator function (3 parameter elements, for each
	predictor and the intercept).

	-ml- with type -lf- evaluators take numerical derivatives with respect
	to the linear predictor in each equation (1 equation for the probit
	model).

Here is a table listing the evaluator types of -ml- and -optimize()-; with the
comparable evaluator types in the same row:

	Stata's -ml-			Mata -optimize()-
	-------------------------------------------------
	d0				d0
	d1 (without scores)		d1
	d2 (without scores)		d2
	lf				--
	--				v0
	d1 (with scores)		v1
	d2 (with scores)		v2

The point here is that -ml-'s methods for handling type -lf- evaluators are
not comparable to -optimize()-'s methods for type -v0-.

If Antoine can derive formulas for producing the gradient for an -ml- type
-d1- evaluator and the parameter level scores for an -optimize()- type -v1-
evaluator, a speed comparison will show that -ml- with -d1- and -optimize()-
with -v1- will be faster than -lf- and -v0-, respectively, and -optimize()-
will tend to be faster than -ml-.

Following my signature, I've included a modified version of Antoine's 'probit'
model comparison.  My -probit_v2()- function computes parameter level scores
(and a Hessian matrix which is not used by -v1-) for -optimize()-, while my
-myprobit_d2- program computes a gradient (and the negative Hessian which is
not used by -d1-) for -ml-.  A quick timing on my machine showed the following
results:

. timer list
   1:      0.13 /        1 =       0.1280
   2:      0.18 /        1 =       0.1830

Note that -optimize()- cannot have a type -lf- evaluator because there is no
way to impose the linear form assumptions.  User arguments are free to be any
valid Mata object and -optimize()- does not have the concept of an equation.

Incidentally, changing the specified evaluator types from -v1- to -v2- and -d1-
to -d2- yielded the following results:

. timer list
   1:      0.04 /        1 =       0.0440
   2:      0.08 /        1 =       0.0830

And moving the evaluator types to -v0- and -d0- (all numerical derivatives)
yielded:

. timer list
   1:      0.18 /        1 =       0.1780
   2:      0.27 /        1 =       0.2710

(Note that timings will vary from run-to-run due to the amount of activity on
your computer).

--Jeff
[email protected]

-----------------------------------------------------------------------------

clear all

set mem 20m
set obs 2000

drawnorm x1 x2 eps

gen cons=1

gen y=(1+x1+x2+eps>0)

timer clear 1

timer clear 2


mata:
void lnprobit_v2(
	real scalar	todo,
	real rowvector	beta,
	real colvector	y,
	real matrix	X,
	real colvector	llj,
	real matrix	g,
	real matrix	H
)
{
	real colvector	pm
	real colvector	xb
	real colvector	lj
	real colvector	dllj
	real colvector	d2llj
	real scalar	dim
	real scalar	nobs

	nobs	= rows(y)
	dim	= cols(X)
	if (nobs != rows(X) | dim != cols(beta)) {
		_error(3200)
	}

	pm	= 2*(y :!= 0) :- 1
	xb	= X*beta'

	lj	= normal(pm:*xb)
	llj	= ln(lj)
	if (todo == 0 | missing(llj)) return

	dllj	= pm :* normalden(xb) :/ lj
	if (missing(dllj)) {
		llj = .
		return
	}
	g	= dllj :* X
	if (todo == 1) return

	d2llj	= dllj :* (dllj + xb)
	if (missing(d2llj)) {
		llj = .
		return
	}
	H	= - cross(X, d2llj, X)
}

timer_on(1)
y=x=.
st_view(y,  ., "y")
st_view(x,  ., ("x1","x2","cons"))

S = optimize_init()
optimize_init_evaluator(S, &lnprobit_v2())
optimize_init_evaluatortype(S, "v1")
optimize_init_params(S, (0.1,0.5,0))
optimize_init_argument(S, 1, y)
optimize_init_argument(S, 2, x)

betahat = optimize(S)
sigmahat=sqrt(diagonal(optimize_result_V_oim(S)))'
tstat=betahat:/sigmahat
pval=2:*(1:-normal(abs(tstat)))
(betahat\sigmahat\tstat\pval)'
timer_off(1)

end

program myprobit_d2
	version 10
	args todo b lnf g negH g1
	tempvar xb lj
	mleval `xb' = `b'
	quietly {
		gen double `lj'  = normal( `xb')  if $ML_y1 == 1
		replace    `lj'  = normal(-`xb')  if $ML_y1 == 0
		mlsum `lnf' = ln(`lj')
		if (`todo'==0 | `lnf' >= .) exit

		replace `g1' =  normalden(`xb')/`lj'  if $ML_y1 == 1
		replace `g1' = -normalden(`xb')/`lj'  if $ML_y1 == 0
		mlvecsum `lnf' `g' = `g1', eq(1)
		if (`todo'==1 | `lnf' >= .) exit

		mlmatsum `lnf' `negH' = `g1'*(`g1'+`xb'), eq(1,1)
	}
end

ml model d1 myprobit_d2 (y=x1 x2)
mat I= (0.1 , 0.5,0)
ml init I, copy

timer on 2
ml max, search(off)
timer off 2

timer list

***** END:

*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
Follow-Ups:
- Re: st: [Mata newbie] optimize() vs -ml-
  - From: Antoine Terracol <[email protected]>
Prev by Date: Re: st: programming graphs in stata
Next by Date: st: RE: Re: moving from Stata to Mata
Previous by thread: st: [Mata newbie] optimize() vs -ml-
Next by thread: Re: st: [Mata newbie] optimize() vs -ml-
Index(es):
- Date
- Thread