Statalist The Stata Listserver


[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

Re: st: Scalars versus temporary variables in MLE


From   wgould@stata.com (William Gould, Stata)
To   statalist@hsphsun2.harvard.edu
Subject   Re: st: Scalars versus temporary variables in MLE
Date   Wed, 21 Jun 2006 08:34:22 -0500

Deepankar Basu <basu.15@osu.edu> and others have been puzzling over why use of
scalars in place of variables for intermediate results cause Deepkankar's -ml-
method -lf- likelihood maximization problem to stop working.

Here's Deepankar's method -lf- evaluator:

        program myweibul_lf
        version 8.1

        args lnf leta lgam
        tempvar p M R
        quietly {
                gen double `p' = exp(`lgam')
                gen double `M' = ($ML_y1*exp(-`leta'))^`p'
                gen double `R' = ln($ML_y1)-`leta'
                replace `lnf' = -`M' + $ML_y2*(`lgam' - `leta' + (`p'-1)*`R')
        }
        end

In the broken version, he substitutes -scalar- for -gen double-.

Nick Cox <n.j.cox@durham.ac.uk> has already diagnosed the difficulty:  "A
scalar can _only_ hold a single value."

I want to emphasize what Nick is saying.  In method -lf-, your programs 
is to calculate a log likelhood value for each observaton of the dataset:

                y    x1   x2  ...  xn     Xb    lnf
           -----------------------------------------
            1.  5     2    7  ...   9    1.2   -.69
            2.  6     8    1  ...   2    3.4  -2.30
            .    
            .    
           _N.  2     3    1  ...   4    1.1   -.10
           -----------------------------------------

Then -ml- sums the -lnf- column (-.69 + -2.30 + ... + -.10) to obtain the 
overall log likelihood value.

In the example above, I show the LHS variable (y), the RHS variables (x1, 
x2, ..., xn), and I show X*b.  -ml- calculates X*b for use, and our job
(in this single-equation example) is to calculate ln(f(Xb)) and return 
it in lnf.  

Deepkankar's


        args lnf leta lgam

So now let's look at Deepankar's program.  His problem is a two-equation
model, and variables `leta' and `lgam' correspond to what I labelled
Xb in the example above.

We can use scalars in place of -gen double ...- anyplace the value is constant
across observations.  For instance, if Deepankar needed ln(_pi) someplace in
his calculation, he could code

             scalar lnpi = ln(_pi) 

Deepkankar might even find some other places in the code where he could 
substitute scalars in place of variables.  For instance, Deepankar might know
that he specifies the second equation (`lgam') as an intercept only.
`lgam' would still be a variable, but the values in each of the observations 
would all be the same, and if Deepkankar needed another value that was 
purely a function of `lgam' and other scalar values, such as ln(_pi), he 
could code 

             scalar term = `lgam'*lnpi

In general, however, I recommend against looking for these kind of
efficiencies for two reasons:

    1.  If you follow this strategy and later you want to put a full 
        equation on `lgam' (imagine heteroskedasticity, etc.), you 
        cannot.  Actually, it is worse than that:  You can, but you will
        get the wrong answer.  Understand what is happening here:  For 
        efficiency, you write your program as if `lgam' is a scalar and 
        later, forgetting that, specify a model in which `lgam' varies 
        observation by observation.

    2.  If you are still determined to seek the efficiency of treating 
        `lgam' as a scalar, you must *NOT* specify -missing- option, or 
        you must write more sophisticated code.  Even though you know 
        `lgam' is a constant, -ml- does not, so it is a variable, with 
        values for each observation.  Only the observations being used 
        are filled in, and that might not include the 1st observation.
        You do not, in general, have to worry about that, because by 
        default, -ml- drops irrelevant observations during estimation, 
        and restores the full dataset later.

Anyway, in my view, I have no objections to using scalars for 
real scalars, such as

             scalar lnpi = ln(_pi) 

but I recommend treating all variables passed to you by -ml-, 

        args lnf leta lgam
                 ----------- ...
                     |
                     I refer to thsee variables

as if they vary observation by observation.  Use -gen double-, not -scalar-.

Finally, everything said above applies only to -ml- method -lf-.  In the 
other methods, your program is responsible for returning a scalar 
log likelihood value, and scalars are often used inside those programs.

-- Bill
wgould@stata.com
*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index