[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]
Re: st: Maximum Likelihood

From	[email protected] (William Gould, Stata)
To	[email protected]
Subject	Re: st: Maximum Likelihood
Date	Mon, 05 May 2003 09:13:08 -0500
Deborah Kobes <[email protected]> is having problem estimating a model 
with -ml-:

> I made those changes and still got missings with the 'no constant'
> specification [...]  I have read the Gould and Scribney book but haven't
> found anything about why the constant/no constant specification would change
> whether the model works.
>
> Any further ideas would be much appreciated

Evidently, Deborah can estimate the model fine if a constant is included in 
the model, but if she suppresses the constant, -ml- interates a bit then 
reports

>       .         ml maximize;
>
>       initial:       log likelihood = -30971.235
>       alternative:   log likelihood = -21880.591
>       rescale:       log likelihood = -20492.416
>       rescale eq:    log likelihood = -20304.363
>       Iteration 0:   log likelihood = -20304.363  
>       Iteration 1:   log likelihood =  -17187.65  
>       Iteration 2:   log likelihood = -17036.272  
>       Iteration 3:   log likelihood = -17034.944  
>       Iteration 4:   log likelihood = -17034.944  (not concave)
>
>       [output follows with missing values for some standard errors]

For Deborah's information, models without an intercept are often more difficult
to fit because, in a single parameter, all scaling problems are solved.
Something like.  Consider a term such as 

        ln(normprob(x*b)) 

The above calculation produces missing value if x*b <= -38 and, worse, it not
all that accurate at -38 < x*b < -30.  Understand that normprob(-30) =
4.91e-198, and it is difficult to describe in words just what a small
probability that is.  Physcisists wonder if protons ultimately evaporate 
but, if they do, the universe is not yet old enough that many would have.
That the kind of probably 4.91e-198 is. 

If the parameters, during estimation, wonder into this area of the likelhood
function, problems are assured.  An intercept tends to solve all those 
problems:

        ln(normprob(a + x*b)) 

Now, no longer how absurd a set of b values, and even if x*b becomes very 
large negatively, one parameter -- a -- tends to get set to keep the overall
linear combination a+x*b in a reasonable area of the function.  That is why 
intercepts help so much.

In any case, when writing an evaluator for a likelihood function that will 
be used without an intercept, extreme care must be taken in the numerical 
evaluation.  The hill climber really does not want to go into these 
extreme areas, but numerical inaccuracy can cause it to wander temporarily 
off course.  There is exactly such a problem in Deborah's evaluator:

         program define ml_move2
                 args lnf theta1 theta2;
                 quietly replace `lnf' = 
                         ln( (1 - normprob(`theta1' + `theta2'))*probmax + 
                             (1 - normprob(`theta2')*(1-probmax)) ) 
                         if $ML_y1==1
                 quietly replace `lnf' = 
                          ln(  normprob(`theta1' + `theta2')*probmax +
                               normprob(`theta2')* > (1-probmax) ) 
                          if $ML_y1==0
        end

The problem line is 

                         ln( (1 - normprob(`theta1' + `theta2'))*probmax + 
                             (1 - normprob(`theta2')*(1-probmax)) ) 
                         if $ML_y1==1

and the particular parts in question are 

                         (1 - normprob(`theta1' + `theta2'))*probmax 
and
                         (1 - normprob(`theta2')*(1-probmax)) 

Never code 1-normprob(z).  Code instead normprob(-z), which is far more
accurate.  The problem line should read

                         ln( (normprob(-`theta1' - `theta2'))*probmax + 
                             (normprob(-`theta2')*(1-probmax)) ) 
                         if $ML_y1==1

With luck, this will solve Deborah's problem.  

If not, then Deborah is going to have obtain better starting values, too, and 
here is how she can do that:

    1.  Fit the model with an intercept.
        She has two equations, and lets call the results a1+b1*x and a2+b2*z.

    2.  In the data, generate 

             . gen i1 = a1+b1*x
             . gen i2 = a2+b2*z

             . gen j1 = b1*x
             . gen j2 = b2*z

    3.  Obtain the means of each:

             . summarize i1 j1

             . summarize i2 j2

    4.  Solve for constants c1 and c2 such that 

              mean(i1) = mean(j1)*c1

              mean(i2) = mean(j2)*c2

    5.  Use as the starting values c1*b1 and c2*b2 in estimation without
        the intercept.

The above is an attempt to find decent starting values of the right scale
for the no-constant model.

I suspect, however, that finding better starting values will not be necessary
once the program is fixed to be more accurate.

-- Bill
[email protected]
*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
Prev by Date: Re: RE: RE: st: creating lag variables (corrected)
Next by Date: st: Memory allocation
Previous by thread: RE: st: Maximum Likelihood
Next by thread: RE: st: Maximum Likelihood
Index(es):
- Date
- Thread