Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: -gllamm- vs -meglm-


From   [email protected] (Jeff Pitblado, StataCorp LP)
To   [email protected]
Subject   Re: st: -gllamm- vs -meglm-
Date   Wed, 03 Jul 2013 14:57:23 -0500

Daniel Waxman <[email protected]>, Timothy Mak <[email protected]>, and
Joseph Coveney <[email protected]> each had some follow-up questions and
comments in this thread:

Daniel wrote:

> containing such large groups..."
> 
> 3,000 observations in a group isn't what it used to be!  There are
> many of us who routinely work with gigabytes of data, and it would be
> helpful if Stata's documentation made it clear from the outset that
> some estimation routines are not meant for us, if they are not.
> 
> Somehow SAS's glimmix routine is able to fit at least single random
> intercept logistic models on huge datasets relatively quickly (and it
> does so while using very little memory).    To the extent that the
> Stata development team is willing to comment on a competing product,
> I'd be interested on their take on whether Stata will be able to do
> the same any time soon.

Timothy replied:

> Single random intercept logistic models is not what the original post was
> about. I think Stata should be able to do that quickly also. 

> The original post involves calculations of conditional likelihood. I must
> admit I'm intrigued why groups involving >1000 observations caused a problem
> in the calculations. But based on my limited understanding of conditional
> likelihood, it's probably to do with having a sum of some 1000+ elements,
> and perhaps rounding errors are causing some of the sums to be negative. But
> if anyone knows more, please comment! 

Joseph commented on Daniel's mention of SAS's GLIMMIX, pointing out that its
default estimation procedure is not the same as the one implemented in the
-me- commands, which use direct Maximum Likelihood with adaptive quadrature.

We believe Daniel is referring to the statement

> Although the -me- commands were not designed to work with data containing
> such large groups, we plan ...

We must admit that this statement sounds much stronger than we originally
intended.  The point we are trying to make regarding the -me- commands, and
any random effects model that uses numerical methods to approximate the
integrals (e.g. quadrature) in the likelihood calculation, is that the
likelihood calculation for a single group is a function of the conditional
likelihoods within that group and large groups can cause scaling problems in
calculating the likelihood or its derivatives.  Large datasets are generally
not a problem, but large groups can be.

For example, if at a single point (abscissa) in the quadrature calculation the
average log-likelihood for a given observation within a group is -10, then
the conditional log-likelihood for a group of size 10 at this point is -100.
Quadrature is performed on the likelihoods, not the log-likelihoods, so the
conditional likelihood for this group would then be exp(-100) = 3.72e-44.
Suppose the group has 100 individuals, then the conditional log-likelihood for
the group is -1,000, and exp(-1,000) is effectively zero.  In
fact, we know that the domain for non-zero/non-missing values returned by
function -exp()- is approximately -709 to 709.  The quadrature code developed
for the -me- commands accounts for this limited range by rescaling the group
level likelihoods; however, there are limits on how far this rescaling can be
effective.

The source of the "initial values not feasible" error message in Jeph Herrin's
original post was caused by missing values in the gradient and Hessian
calculations.  By default, the -me- commands compute the gradient and Hessian
analytically.  Analytical derivatives are generally faster to compute, when
tractable, and yield more numerically accurate results.  For models using
quadrature, the partial derivatives of the conditional log-likelihoods are
carried forward to the group level using the chain-rule.  As we mentioned
before, quadrature is performed using the exponentiated log-likelihoods.  We
know that

        d exp(z+c)
        -------- = exp(z+c) = exp(z)*exp(c)
          d z

If 'z+c' is the conditional log-likelihood for a given group, then 'c' is the
rescaling factor that allows us to treat exp(z) as the likelihood value in the
quadrature procedure.  We simply keep track of a separate 'c' for each group
while computing the scaled likelihood values and simply add them back after we
take the log.  The problem is that, at some point, the rescaling factor 'c'
can get so large that exp(c) returns a missing value in the analytical
derivative calculation.  One way around this is to use numerical derivatives
instead, as we described in our earlier post:

    http://www.stata.com/statalist/archive/2013-07/msg00072.html

--Jeff					--Isabel
[email protected]			[email protected]
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index