Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: st: -gllamm- vs -meglm-


From   "William Buchanan" <[email protected]>
To   <[email protected]>
Subject   RE: st: -gllamm- vs -meglm-
Date   Wed, 3 Jul 2013 12:25:53 -0700

It seems that some of the initial comments to your email are still valid.
You're comparing the performance of different algorithms and assuming that
they will both yield identical results (that will be correct) and perform
the same.  I'm not saying, or suggesting, that the algorithm in SAS is any
better or worse than those available in Stata, but unless all the other
confounding issues are considered (e.g., parallelization in one vs the other
software, memory management in the software, etc...) the comparison still
makes little sense.  On that note, however, the performance of HLM with
generalized linear models also seems to be faster than Stata (v 12.1); this
could always be due to the difference between HLM being highly specialized
on a single set of tasks.  

Also, it might be worth it to check some of the matrices from the estimation
to see if similar issues appear in the SAS output (e.g., missing values of
some derivatives), ensure the convergence criteria is the same, and check to
see if the programs have similar behavior when encountering problematic
data/model combinations (e.g., if Stata produces an error, would SAS do the
same under the exact same conditions).

Just some thoughts that could help make the discussion a bit more
informative in describing the performance issues after dealing with the
mitigating factors.

-Billy

-----Original Message-----
From: [email protected]
[mailto:[email protected]] On Behalf Of Daniel Waxman
Sent: Wednesday, July 03, 2013 12:03 PM
To: [email protected]
Subject: Re: st: -gllamm- vs -meglm-

Joseph Coveney wrote:

Just for clarification, is PROC GLIMMIX fast and light on gigabyte-sized
datasets even when it's using seven-abscissa adaptive Gauss-Hermite
quadrature as its estimation method?  According to its documentation, "The
default estimation technique in generalized linear mixed models is residual
pseudo-likelihood with a subject-specific expansion (METHOD=RSPL)."*

------------------------------------------------------------------

Joseph and Tim, thanks for your replies.

I can't speak GLIMMIX's performance using that particular estimation method;
the method that I've been using is called "NRRIDGE"
(Newton-Raphson with Ridging).   To give an example, I just ran a
model  with 186 variables, a random intercept with 5,269 groups, and
270,684 observations (a 1% sample), using 1.3 seconds of CPU time!  So far I
haven't been able to get this to run at all in Stata, even using the
numerical integration options.  For me, it's all about the destination, not
the journey, meaning that I couldn't care less what sort of estimation
technique is used as long as the results are correct.  If two methods
produce correct results and one takes minutes and the other takes hours or
fails to converge at all, then I'll take the first one.

Of course, the validity of the results might be the rub.  Does anybody know
of a good reason to be wary of the NRRIDGE algorithm?

I've been a long-time Stata fan; believe me, I'd love to never have to
use anything else.   But data seems to be getting bigger faster than
memory is getting cheaper, so the jury still seems to be out as to whether
that is going to be possible.

Dan
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index