RE: st: RE: Problems in maximising a likelihood function

 From Deepankar Basu To statalist@hsphsun2.harvard.edu Subject RE: st: RE: Problems in maximising a likelihood function Date Wed, 14 Jun 2006 19:28:35 -0400

```The dimensionality of my parameter space is 15; my sample is pretty
large. I have about 50,000 observations (with lot of variation in all
the regressors). Hence, I do not expect any problems on that count. Of
course, my model might be unidentified for some parameters (though I
don't think that is the case). I will think further about the
identifiability issue.

I have removed all temporary variables from the program now. I have used
scalars for all the temporary variables.

But I cannot clean up the expression for the log likelihood any further.
As I had said earlier, the log likelihood for each observation is of the
following form: ln(a+b); where 'a' and 'b' are huge expressions
involving exponentials, etc. I don't see how I can simplify it any
further; if you can suggest something, that will be great.

As far I can see, all the expressions in the re-worked likelihood
evaluator (given below) are well-behaved. I might be wrong, but let me
just go over each:

q1...q14 are defined in such a manner that they are always within 0 and
1, and sub-groups of them sum to 1. For instance, q1+q2=1; q3+q4+q5=1:
and so on.

x1 is bounded since ML_y1 lies between 2 and 6 in the dataset.
(Additionally, ML_y2...ML_y16 all lies between 0 and 1.)

x2 is just the factorial of ML_y1, and so it is bounded.

x3 is the product of x1 and x2.

x4 and x6 lie between 0 and 1; the limit of x4 is 0 when `theta1' goes
to + or - infinity, and the limit is (1/e) when `theta1' goes to 0.
Additionally, both are continuous functions of `theta1'. So, these
should not cause problems.

x5 is the normal cdf; hence it is always between 0 and 1.

So, the only way the log likelihood (i.e., `lnf') could blow up (to
-infinity) is if the expression to be logged comes very close to 0. But
if we start with initial values of 0 for all the parameters, we are
bounded away from 0 because of the first expression within the log
likelihood (which is (1/x3)*x4*x5). x3 is a finite number; x4 should be
the reciprocal of 'e' and x5 should be 0.5.

I don't see where the log likelihood could blow up. Any suggestions are
welcome. I give the re-worked program below.

___________________________________________________

clear
capture program drop gender1_lf
program gender1_lf
version 8.1
* set mem 100m
#delimit ;
args lnf theta1 theta2 theta3
theta5 theta6 theta7 theta8 theta9
theta10 theta11 theta12 theta13 theta14;
#delimit cr

quietly {
scalar q1 = 1/(1 + exp(`theta5'))
scalar q2 = exp(`theta5')/(1 + exp(`theta5'))

scalar q3 = 1/(1 + exp(`theta6') + exp(`theta7'))
scalar q4 = exp(`theta6')/(1 + exp(`theta6') + exp(`theta7'))
scalar q5 = exp(`theta7')/(1 + exp(`theta6') + exp(`theta7'))

scalar q6 = 1/(1 + exp(`theta8') + exp(`theta9') +
exp(`theta10'))
scalar q7 = exp(`theta8')/(1 + exp(`theta8') + exp(`theta9') +
exp(`theta10'))
scalar q8 = exp(`theta9')/(1 + exp(`theta8') + exp(`theta9') +
exp(`theta10'))
scalar q9 = exp(`theta10')/(1 + exp(`theta8') + exp(`theta9') +
exp(`theta10'))

scalar q10 = 1/(1 + exp(`theta11') + exp(`theta12') +
exp(`theta13') + exp(`theta14'))
scalar q11 = exp(`theta11')/(1 + exp(`theta11') + exp(`theta12') +
exp(`theta13') + exp(`theta14'))
scalar q12 = exp(`theta12')/(1 + exp(`theta11') + exp(`theta12') +
exp(`theta13') + exp(`theta14'))
scalar q13 = exp(`theta13')/(1 + exp(`theta11') + exp(`theta12') +
exp(`theta13') + exp(`theta14'))
scalar q14 = exp(`theta14')/(1 + exp(`theta11') + exp(`theta12') +
exp(`theta13') + exp(`theta14'))

scalar x1 = 2^(\$ML_y1)
scalar x2 = exp(lnfact(\$ML_y1))
scalar x3 = x1*x2
scalar x4 = ((exp(`theta1'))^(\$ML_y1)/exp(exp(`theta1')))
scalar x5 = norm(-`theta3')
scalar x6 =
((exp(`theta1'+`theta2'))^(\$ML_y1)/exp(exp(`theta1'+`theta2')))

}
#delimit ;
quietly replace `lnf' = ln(((1/x3)*x4*x5) +
((1-x5)*x6*( \$ML_y2\$ML_y3*q1
+ \$ML_y4*q2 +
\$ML_y5*q3 + \$ML_y6*q4 + \$ML_y7*q5 +
\$ML_y8*q6 + \$ML_y9*q7+ \$ML_y10*q8 +
\$ML_y11*q9 +
\$ML_y12*q10 + \$ML_y13*q11 + \$ML_y14*q12
+
\$ML_y15*q13 + \$ML_y16*q14)));
#delimit cr

end

use india2.dta

#delimit ;
ml model lf gender1_lf (one: dfsize p21 p31 p32 p41 p42 p43 p51 p52 p53
p54 = v012 v025, nocons)
(two: )
(three: p61 p62 p63 p64 p65 = v012 v130, nocons)
(four:) (five:) (six:) (sev:) (eight:) (nine:) (ten:) (eleven:)
(twelve:) (thirteen:), tech(bhhh nr);

#delimit cr

____________________________________________________

On Wed, 2006-06-14 at 08:10 +0100, Nick Cox wrote:
> The limit you are approaching may just be that your
> model is not well suited to your data. Very simply,
> what is the dimensionality of your parameter space
> to do?
>
> Nevertheless it seems that you have only partially acted on
> the earlier comments. Glancing at the code again
> I seem to see expressions which are logarithms of
> products, logarithms of ratios, logarithms
> of exponentials, etc. so there may be much more
> tbat you can do by way of cleaning up the expression.
>
> Also, it looks very much as if your thetas
> are all single-valued parameters, in which
> case scalars rather than variables are appropriate
> for what you are calling q1 ...
>
> Nick
> n.j.cox@durham.ac.uk
>
> Deepankar Basu
>
> > Nick,
> >
> > likelihood. Actually, there was an error in one of the
> > put a "+" in place of a "*". so, now I have negative values of the log
> > likelihood. But, I still don't have convergence.
> >
> > About calculating the log likelihood directly: in my case, the log
> > likelihood for each observation is the log of a sum (of two
> > quantities).
> > Hence, I cannot simply things much using the log function
> > beyond taking
> > the log of the whole expression. That is what I have done
> > now. So, now I
> > am working with the log likelihood directly.
> >
> > Additionally, I have removed some of the temporary variables
> > that I was
> > using to define the log likelihood and have instead used the
> > expressions
> > directly. Since some of the temporary variables (like x3, x4
> > or x6) were
> > exp functions, there was a potential for them to become too large.
> > Hence, I have removed them. The temporary variables that I
> > use now (x1,
> > x2 and x5 apart from q*) are all bounded.
> >
> > Now, when I issue the - ml max - command (after ml check
> > shows that all
> > is well), I get the following error:
> >
> > initial:       log likelihood = -2841.7221
> > rescale:       log likelihood = -1595.8146
> > rescale eq:    log likelihood = -1452.1748
> > (setting optimization to BHHH)
> > numerical derivatives are approximate
> > flat or discontinuous region encountered
> > numerical derivatives are approximate
> > flat or discontinuous region encountered
> > numerical derivatives are approximate
> > flat or discontinuous region encountered
> > could not calculate numerical derivatives
> > missing values encountered
> > r(430);
> >
> > Comments and suggestions are welcome. I give the re-worked do-file
> > below. Thanks,
> > Deepankar
> >
> > ____________________________________________________________
> >
> > clear
> > capture program drop gender1_lf
> > program gender1_lf
> >    version 8.1
> >    * set mem 100m
> >    #delimit ;
> >    args lnf theta1 theta2 theta3
> >         theta5 theta6 theta7 theta8 theta9
> >         theta10 theta11 theta12 theta13 theta14;
> >
> >    tempvar q1 q2 q3 q4 q5 q6 q7 q8 q9 q10 q11 q12 q13 q14
> >            x1 x2 x5;
> >    #delimit cr
> >
> >    quietly {
> >       gen double `q1' = 1/(1+exp(`theta5'))
> >       gen double `q2' = exp(`theta5')/(1+exp(`theta5'))
> >
> >       gen double `q3' = 1/(1+exp(`theta6')+exp(`theta7'))
> >       gen double `q4' = exp(`theta6')/(1
> > +exp(`theta6')+exp(`theta7'))
> >       gen double `q5' = exp(`theta7')/(1
> > +exp(`theta6')+exp(`theta7'))
> >
> >       gen double `q6' = 1/(1
> > +exp(`theta8')+exp(`theta9')+exp(`theta10'))
> >       gen double `q7' = exp(`theta8')/(1
> > +exp(`theta8')+exp(`theta9')+exp(`theta10'))
> >       gen double `q8' = exp(`theta9')/(1
> > +exp(`theta8')+exp(`theta9')+exp(`theta10'))
> >       gen double `q9' = exp(`theta10')/(1
> > +exp(`theta8')+exp(`theta9')+exp(`theta10'))
> >
> >       gen double `q10' = 1/(1
> > +exp(`theta11')+exp(`theta12')+exp(`theta13')+ exp(`theta14'))
> >       gen double `q11' = exp(`theta11')/(1
> > +exp(`theta11')+exp(`theta12')+exp(`theta13')+ exp(`theta14'))
> >       gen double `q12' = exp(`theta12')/(1
> > +exp(`theta11')+exp(`theta12')+exp(`theta13')+ exp(`theta14'))
> >       gen double `q13' = exp(`theta13')/(1
> > +exp(`theta11')+exp(`theta12')+exp(`theta13')+ exp(`theta14'))
> >       gen double `q14' = exp(`theta14')/(1
> > +exp(`theta11')+exp(`theta12')+exp(`theta13')+ exp(`theta14'))
> >
> >       gen double `x1' = 2^(-\$ML_y1)
> >       gen double `x2' = exp(lnfact(\$ML_y1))
> >
> >       gen double `x5' = norm(-`theta3')
> >
> >
> >    }
> >       #delimit ;
> >       quietly replace `lnf' =
> > ln((`x1'/`x2')*(exp(-(exp(`theta1'))))*(exp(\$ML_y1*`theta1'))*`x5' +
> >
> > ((1-`x5')*(exp(-(exp(`theta2'))))*((exp(\$ML_y1*`theta2'))))*( \$ML_y2 +
> > \$ML_y3*`q1' + \$ML_y4*`q2' +
> >                                  \$ML_y5*`q3' + \$ML_y6*`q4'+
> > \$ML_y7*`q5'
> > +
> >                                  \$ML_y8*`q6' + \$ML_y9*`q7'+
> > \$ML_y10*`q8'+\$ML_y11*`q9' +
> >                                  \$ML_y12*`q10' + \$ML_y13*`q11' +
> > \$ML_y14*`q12'+ \$ML_y15*`q13'+\$ML_y16*`q14'));
> >       #delimit cr
> >
> > end
> >
> >
> > use india2.dta
> >
> > #delimit ;
> > ml model lf gender1_lf (one: dfsize p21 p31 p32 = v012 v025, nocons)
> > (two: p41 p42 p43 p51 p52 p53 p54 = v012 v025)
> > (three: p61 p62 p63 p64 p65 = v012 v130, nocons)
> > (four:) (five:) (six:) (sev:) (eight:) (nine:) (ten:) (eleven:)
> > (twelve:) (thirteen:), tech(bhhh nr);
> >
> > #delimit cr
> >
> >
> >
> > _____________________________________________________________________
> >
> >
> >
> >
> > On Tue, 2006-06-13 at 17:46 +0100, Nick Cox wrote:
> > > I am not an -ml- expert, but I sense some basic
> > > misunderstandings.
> > >
> > > A positive likelihood is not problematic as such.
> > > Although a probability cannot exceed 1,
> > > this is not true of a probability density. Whenever
> > > probability densities are typically above 1, their product
> > > will be too.
> > >
> > > However, your program calculates the likelihood directly,
> > > before logging it, and the numbers become far too big to handle.
> > > exp(35000) is big!!!
> > >
> > > Thus it is the likelihood that becomes missing, not your
> > > original data. Stata would just ignore any data missings.
> > >
> > > You must rewrite your program so that you
> > > are working in terms of the log likelihood throughout.
> > >
> > > A side-issue with your program is that you appear to
> > > be putting lots of constants into variables. Using
> > > scalars is much better style and less wasteful
> > > of storage.
> > >
> > > Nick
> > > n.j.cox@durham.ac.uk
> > >
> > > DEEPANKAR BASU
> > > >
> > > > I am facing problems while trying to do a maximum likelihood
> > > > estimation. I have written out the likelihood estimator; then
> > > > I have used "ml check". The "ml check" shows that my
> > > > likelihood evaluator has passed all tests. Then when I issue
> > > > the "ml max" command, I get the following error message:
> > > >
> > > > initial:       log likelihood =  31058.016
> > > > rescale:       log likelihood =  31058.016
> > > > rescale eq:    log likelihood =  34639.752
> > > > (setting optimization to BHHH)
> > > > could not calculate numerical derivatives
> > > > missing values encountered
> > > > r(430);
> > > >
> > > > First thing that worries me is that the log likelihood is
> > > > positive; is that a sign that my program has some problem?
> > > > And second, the maximisation is interrupted before completion
> > > > because it could not calculate numerical derivatives (missing
> > > > values encountered). I re-checked my dataset; there are no
> > > > missing values for the variables that I am using. Probably
> > > > the error message means something else.
> > > >
> > > > I have tried (1) switching between techniques BHHH and NR;
> > > > (2) ml max, difficult. I still get the same error message
> > > > (though the log likelihood values where the execution of the
> > > > program stops are different in each case).
> > > >
> > > >
> > > > Below I give the do-file that has my likelihood evaluator
> > > > program and also a "ml model ..." command (as suggested in
> > > > the book by Gould, Pitbaldo and Sribney: page 165); any
> > > > suggestions are welcome.
> > > >
> > > > If someone can detect some error immediately, that will be
> > > > great. But, if someone needs to run the program, I can send a
> > > > small subset of my dataset for the purpose. Thanks in advance.
> > > >
> > > > Deepankar Basu
> > > >
> > > > ____________________________________________
> > > >
> > > >
> > > > clear
> > > > capture program drop gender_lf
> > > > program gender_lf
> > > >    version 8.1
> > > >    set mem 100m
> > > >    #delimit ;
> > > >    args lnf theta1 theta2 theta3
> > > >         theta5 theta6 theta7 theta8 theta9
> > > >         theta10 theta11 theta12 theta13 theta14;
> > > >
> > > >    tempvar q1 q2 q3 q4 q5 q6 q7 q8 q9 q10 q11 q12 q13 q14
> > > >            x x1 x2 x3 x4 x5 x6 x7;
> > > >    #delimit cr
> > > >
> > > >    quietly {
> > > >       gen double `q1' = 1/(1+exp(`theta5'))
> > > >       gen double `q2' = exp(`theta5')/(1+exp(`theta5'))
> > > >
> > > >       gen double `q3' = 1/(1+exp(`theta6')+exp(`theta7'))
> > > >       gen double `q4' =
> > > > exp(`theta6')/(1+exp(`theta6')+exp(`theta7'))
> > > >       gen double `q5' =
> > > > exp(`theta7')/(1+exp(`theta6')+exp(`theta7'))
> > > >
> > > >       gen double `q6' =
> > > > 1/(1+exp(`theta8')+exp(`theta9')+exp(`theta10'))
> > > >       gen double `q7' =
> > > > exp(`theta8')/(1+exp(`theta8')+exp(`theta9')+exp(`theta10'))
> > > >       gen double `q8' =
> > > > exp(`theta9')/(1+exp(`theta8')+exp(`theta9')+exp(`theta10'))
> > > >       gen double `q9' =
> > > > exp(`theta10')/(1+exp(`theta8')+exp(`theta9')+exp(`theta10'))
> > > >
> > > >       gen double `q10' =
> > > > 1/(1+exp(`theta11')+exp(`theta12')+exp(`theta13')+
> > exp(`theta14'))
> > > >       gen double `q11' =
> > > > exp(`theta11')/(1+exp(`theta11')+exp(`theta12')+exp(`theta13')
> > > > + exp(`theta14'))
> > > >       gen double `q12' =
> > > > exp(`theta12')/(1+exp(`theta11')+exp(`theta12')+exp(`theta13')
> > > > + exp(`theta14'))
> > > >       gen double `q13' =
> > > > exp(`theta13')/(1+exp(`theta11')+exp(`theta12')+exp(`theta13')
> > > > + exp(`theta14'))
> > > >       gen double `q14' =
> > > > exp(`theta14')/(1+exp(`theta11')+exp(`theta12')+exp(`theta13')
> > > > + exp(`theta14'))
> > > >
> > > >       gen double `x1' = 2^(-\$ML_y1)
> > > >       gen double `x2' = exp(lnfact(\$ML_y1))
> > > >       gen double `x3' = exp(`theta1')
> > > >       gen double `x4' = exp(\$ML_y1*`theta1')
> > > >       gen double `x5' = norm(-`theta3')
> > > >       gen double `x6' = exp(`theta2')
> > > >       gen double `x7' = exp(\$ML_y1*`theta2')
> > > >       #delimit ;
> > > >       gen double  `x' = `x1'*(1/`x2')*exp(-`x3')*`x4'*`x5' +
> > > > (1-`x5')*exp(-`x6')*`x7' +
> > > >                          \$ML_y2 + (\$ML_y3*`q1' + \$ML_y4*`q2') +
> > > >                          (\$ML_y5*`q3' + \$ML_y6*`q4'+
> > \$ML_y7*`q5') +
> > > >                          (\$ML_y8*`q6' + \$ML_y9*`q7'+
> > > > \$ML_y10*`q8'+\$ML_y11*`q9') +
> > > >                          (\$ML_y12*`q10' + \$ML_y13*`q11' +
> > > > \$ML_y14*`q12'+ \$ML_y15*`q13'+\$ML_y16*`q14');
> > > >       #delimit cr
> > > >
> > > >    }
> > > >
> > > >    quietly replace `lnf' = ln(`x')
> > > > end
> > > >
> > > >
> > > > use india1.dta
> > > >
> > > > #delimit ;
> > > > ml model lf gender_lf (one: dfsize p21 p31 p32 = v012
> > v025, nocons)
> > > > (two: p41 p42 p43 p51 p52 p53 p54 = v012 v025)
> > > > (three: p61 p62 p63 p64 p65 = v012 v025 v130, nocons)
> > > > (four:) (five:) (six:) (sev:) (eight:) (nine:) (ten:)
> > > > (eleven:) (twelve:) (thirteen:), tech(bhhh nr);
> > > >
> > > > #delimit cr
>
> *
> *   For searches and help try:
> *   http://www.stata.com/support/faqs/res/findit.html
> *   http://www.stata.com/support/statalist/faq
> *   http://www.ats.ucla.edu/stat/stata/

*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
```