Stata The Stata listserver
[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

Re: st: first xtmixed question


From   rgutierrez@stata.com (Roberto G. Gutierrez, StataCorp)
To   statalist@hsphsun2.harvard.edu
Subject   Re: st: first xtmixed question
Date   Wed, 13 Apr 2005 15:54:19 -0500

David Airey <david.airey@vanderbilt.edu> asks:

> Actually, this is not so much about xtmixed but Mata. There is some
> discussion of Mata and speed recently. I saw one posting from Vince Wiggins
> of Stata Corp. suggesting a 2000 fold improvement relative to .ado code when
> multiplying matrices. What is the performance loss of a program like xtmixed
> coded by .ado (if there ever was such a thing during development), and coded
> as it is, and how does it now fair compared to SAS Proc Mixed, SPSS Mixed,
> or R LME in terms of speed? I'm assuming speed is comparable across these
> packages. This is not an important question to answer, just curious.

David wonders if an early version of -xtmixed-, entirely coded in ado, ever
existed.  The answer is no, because the required matrix functionality was
already implemented in Mata, and Mata is much faster than ado code.

As to how -xtmixed- compares to other packages, speed comparisons depend
greatly on the type of model, the data, and exactly how you specify the model
in the software.  Not being experts in the other packages, but being experts
in Stata, it is unlikely we could fairly compare speeds -- this is
particularly true of mixed models, where oftentimes several alternate syntaxes
will produce the same fitted model.

What we can do is list some timings for -xtmixed- for a few problems.

In the new manual entry [XT] xtmixed, we run a two-level random-intercept
model, reproducing an analysis published by Baltagi et al. (2001).  These data
consist of 17 observations for each of 48 states, and the states are nested in
9 regions, for a total of 816 observations.  On our machine (Pentium4, 2.6GhZ,
running Linux), the estimation of the three variance components (region level,
state level, overall error), with standard errors, via REML takes just under 1
second.

If I take the above model, and add random coefficients at the region level on
two covariates, resulting in the estimation of 5 variance components, the
estimation takes about 1.8 seconds.

Now let's consider a bigger problem.  In a generated dataset consisting of
three nested levels each with random intercepts, we had 20 observations on
8000 lowest-level groups, nested within 400 second-level groups, nested within
20 first-level groups, for a total of 160,000 observations with four variance
components to be estimated.  This problem took about 3.5 minutes to fit 
using -xtmixed- on our machine.

Of course, these are only a few examples.  We look forward to when this in 
everyone's hands.

--Bobby				--Vince
rgutierrez@stata.com		vwiggins@stata.com

Reference:
Baltagi, B. H., S. H. Song, and B. C. Jung.  2001. The unbalanced nested error 
   component regression model.  Journal of Econometrics, 101: 357-381.
*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index