Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
From | Maarten Buis <maartenlbuis@gmail.com> |
To | statalist@hsphsun2.harvard.edu |
Subject | Re: st: inaccurate hessian from mata: deriv() |
Date | Thu, 25 Oct 2012 20:41:54 +0200 |
The solution is simple, use -ml- if you want to use Stata or -moptimze()- if you want to use Mate. Alternatively, you can work out the analytical solution, which is not that hard for a -probit-. The easiest solution is just not to program what is already implemented in Stata, and just use -probit-. Regardless of which solution you choose, your log likelihood function is wrong: you should use -normal()- instead of -normprob()- as the latter is obsolete. More importantly, you never want to code -(1-normal(`xb'))- if you can avoid it. Instead you should use -normal(-`xb')-. Mathematically, the two are equivalent, but for computers the latter computation is much more precise than the former. See: William Gould (2006) "Mata Matters: Precision" The Stata Journal, 6(4): 550-560. <http://www.stata-journal.com/article.html?article=pr0025> Hope this helps, Maarten On Thu, Oct 25, 2012 at 7:24 PM, Sun Yutao <yutao.sun.statalist@outlook.com> wrote: > Hello users, > > I’m having a problem on the function derive(), it looks like it’s giving a > very inaccurate hessian. Here the problem goes: > > I wrote a Stata probit log-like function and enveloped it with a Mata > function, which does no more than compute and sum the log-likelihood > contributions from the Stata program. > ---------------------------------------------------------- > function MleLikeMata(input_param,struct mPML_model scalar model,|LogLik) { > if (cols(input_param)!=1) { > param=input_param' > } else { > param=input_param > } > > for(i=1;i<=rows(model.equation);i++) { > data_range=strtoreal(tokens(model.equation[i,7])) > param_range=model.location[i,1]::model.location[i,2] > if (model.equation[i,4]=="1") { > model.data[.,model.offset+i]=(model.data[.,data_range],J(rows(model.data),1,1))*param[param_range,.] > } else { model.data[.,model.offset+i]=model.data[.,data_range]*param[param_range,.] > > } > } > stata(model.StataLike+" "+model.XB) > LogLik=sum(model.data[.,model.offset]) > //LogLik=model.data[.,model.offset] > if (args()==2) { > return(LogLik) > } > } > ---------------------------------------------------------- > Where model is just an extra argument I need > > > Then I use this code to compute the gradients and hessian: > ---------------------------------------------------------- > D=deriv_init() > deriv_init_evaluator(D,&MleLikeMata()) > //deriv_init_evaluatortype(D,"v") > deriv_init_search(D, "off")deriv_init_params(D, > param') > deriv_init_argument(D, 1, model) > grad=deriv(D, 1)' > hess=deriv(D, 2) > ---------------------------------------------------------- > > It looks like it's giving a more or less OK gradient but a VERY inaccurate > hessian (I'm comparing this with stata ml command with the same data and same > parameter vector, but -ml- works well!) , i.e. I'm 100% sure the Stata probit > log-likelihood function is concave and hence the hessian should be > negative-definite. > however, it gives some hessian like this. > +----------------------------------------------------------------------------+ > 1 | 238418.5803 > 2 | -119209.2902 476837.1606 > 3 | -178813.9352 -327825.5479 417232.5155 > 4 | -59604.64508 -119209.2902 -208616.2578 357627.8705 > 5 | 149011.6127 -178813.9352 -59604.64508 29802.32254 > 0 > +----------------------------------------------------------------------------+ > The determinant is 2.43788e+27 which means it's certainly not > negative-definite (and you even have a 0 in the diagonal) > > Possible fix I tried: > > 1. I tried to specify it as a v-type evaluator (one that returns a row vector) > 2. I tried to scale sum of the log-likelihood by number of observations > 3. change the 2nd argument in -deriv_init_search(D, "off")- to "bracket" or > "interpolate" (which gives even worse results) > > Could anyone kindly explain me what's wrong and what I should do? > > And infact I think I lost a lot features (fast convergence, etc.) from > Newton-Raphson due to the inaccurate hessian. > > Oh in case you need, the Stata likelihood function is: > --------------------------------------------------- > program define like, nclass > qui { > args lnf XB > replace `lnf' = ln( 1 - normprob(`XB')) if $ML_y1==0 > replace `lnf' = ln( normprob(`XB')) if $ML_y1==1 > } > End > --------------------------------------------------- > Which is the common one for probit > > > > Best regards, > Sun Yutao > > > > > * > * For searches and help try: > * http://www.stata.com/help.cgi?search > * http://www.stata.com/support/faqs/resources/statalist-faq/ > * http://www.ats.ucla.edu/stat/stata/ -- --------------------------------- Maarten L. Buis WZB Reichpietschufer 50 10785 Berlin Germany http://www.maartenbuis.nl --------------------------------- * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/faqs/resources/statalist-faq/ * http://www.ats.ucla.edu/stat/stata/