[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]
Re: st: question about processing time

From	[email protected] (Roberto G. Gutierrez, StataCorp)
To	[email protected]
Subject	Re: st: question about processing time
Date	Tue, 07 Jun 2005 10:44:45 -0500
Both Paul Visintainer <[email protected]> and James Rosenthal
<[email protected]> are interested in processing times for fitting mixed models and
large datasets.

Paul asks: 

> A colleague asked me about Stata's (ver 9.0) ability to run a mixed = model
> with 4 levels on a database with about 1 million records.  If = anyone has
> run something close to this scenario, I'd appreciate your = input.

> I'd like to know how long Stata took to run the model and the =
> configuration of the machine it was run on (I assume its best to load as =
> much memory as the machine can take).

The output at the bottom of this email shows a mixed model fit on 1.12 million
observations, 4 levels of random effects, random intercept at each level.
Fitting the model took about 53 minutes on a P4 2.6Ghz, 1G RAM, running Fedora
Core Linux.

Of course, timings not only depend on the machine, but on the exact
configuration of the 4 grouping levels, number of fixed effects,
random-effects design, etc.  Your mileage will vary.

Also note that what I have below is a 4-level model in Stata parlance,
equivalently a 5-level model in -gllamm- (and other hierarchical linear models
literature) terminology.

James asks:

> I have a much smaller problem (15,000 records with 3 or 4 levels) that
> SPSS MIXED runs out of memory on. HLM handles nicely, but I cannot
> incorporate a 4th level.  

> If I knew STATA could handle problem, I might well upgrade to 9.0.

Since your problem is organized by "levels" (of nested random effects,
presumably), this shouldn't be a problem both memorywise and speedwise.  Stata
takes advantage of the nesting to keep the dimension of the design matrix low,
and thus be less demanding on memory.

--Bobby
[email protected]

----------------------------begin xtmixed output------------------------------

. xtmixed y x1 || level1: || level2: || level3: || level4:, emlog

Performing EM optimization: 

Performing gradient-based optimization: 

Iteration 0:   log restricted-likelihood = -1333058.8  
Iteration 1:   log restricted-likelihood = -1333058.8  

Computing standard errors:

Mixed-effects REML regression                   Number of obs      =   1120000

-----------------------------------------------------------
                |   No. of       Observations per Group
 Group Variable |   Groups    Minimum    Average    Maximum
----------------+------------------------------------------
         level1 |       20      56000    56000.0      56000
         level2 |      400       2800     2800.0       2800
         level3 |     8000        140      140.0        140
         level4 |   160000          7        7.0          7
-----------------------------------------------------------

                                                Wald chi2(1)       = 493942.25
Log restricted-likelihood = -1333058.8          Prob > chi2        =    0.0000

------------------------------------------------------------------------------
           y |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
          x1 |   .4990131     .00071   702.81   0.000     .4976215    .5004048
       _cons |  -.7875853   .1372061    -5.74   0.000    -1.056504   -.5186663
------------------------------------------------------------------------------

------------------------------------------------------------------------------
  Random-effects Parameters  |   Estimate   Std. Err.     [95% Conf. Interval]
-----------------------------+------------------------------------------------
level1: Identity             |
                   sd(_cons) |   .6027093   .1013493       .433484    .8379976
-----------------------------+------------------------------------------------
level2: Identity             |
                   sd(_cons) |   .5019323   .0191549      .4657591    .5409149
-----------------------------+------------------------------------------------
level3: Identity             |
                   sd(_cons) |   .4958665   .0042854       .487538    .5043373
-----------------------------+------------------------------------------------
level4: Identity             |
                   sd(_cons) |   .5001539   .0011706      .4978648    .5024535
-----------------------------+------------------------------------------------
                sd(Residual) |   .7069941   .0005102      .7059947    .7079948
------------------------------------------------------------------------------
LR test vs. linear regression:       chi2(4) =  1.0e+06   Prob > chi2 = 0.0000

-----------------------------end xtmixed output-------------------------------

*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
Prev by Date: Re: st: substring help
Next by Date: st: unique values
Previous by thread: st: question about processing time
Next by thread: st: svymean error in oaxaca
Index(es):
- Date
- Thread