[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

From |
rgutierrez@stata.com (Roberto G. Gutierrez, StataCorp) |

To |
statalist@hsphsun2.harvard.edu |

Subject |
Re: st: question about processing time |

Date |
Tue, 07 Jun 2005 10:44:45 -0500 |

Both Paul Visintainer <PAUL_VISINTAINER@NYMC.EDU> and James Rosenthal <jimar@ou.edu> are interested in processing times for fitting mixed models and large datasets. Paul asks: > A colleague asked me about Stata's (ver 9.0) ability to run a mixed = model > with 4 levels on a database with about 1 million records. If = anyone has > run something close to this scenario, I'd appreciate your = input. > I'd like to know how long Stata took to run the model and the = > configuration of the machine it was run on (I assume its best to load as = > much memory as the machine can take). The output at the bottom of this email shows a mixed model fit on 1.12 million observations, 4 levels of random effects, random intercept at each level. Fitting the model took about 53 minutes on a P4 2.6Ghz, 1G RAM, running Fedora Core Linux. Of course, timings not only depend on the machine, but on the exact configuration of the 4 grouping levels, number of fixed effects, random-effects design, etc. Your mileage will vary. Also note that what I have below is a 4-level model in Stata parlance, equivalently a 5-level model in -gllamm- (and other hierarchical linear models literature) terminology. James asks: > I have a much smaller problem (15,000 records with 3 or 4 levels) that > SPSS MIXED runs out of memory on. HLM handles nicely, but I cannot > incorporate a 4th level. > If I knew STATA could handle problem, I might well upgrade to 9.0. Since your problem is organized by "levels" (of nested random effects, presumably), this shouldn't be a problem both memorywise and speedwise. Stata takes advantage of the nesting to keep the dimension of the design matrix low, and thus be less demanding on memory. --Bobby rgutierrez@stata.com ----------------------------begin xtmixed output------------------------------ . xtmixed y x1 || level1: || level2: || level3: || level4:, emlog Performing EM optimization: Performing gradient-based optimization: Iteration 0: log restricted-likelihood = -1333058.8 Iteration 1: log restricted-likelihood = -1333058.8 Computing standard errors: Mixed-effects REML regression Number of obs = 1120000 ----------------------------------------------------------- | No. of Observations per Group Group Variable | Groups Minimum Average Maximum ----------------+------------------------------------------ level1 | 20 56000 56000.0 56000 level2 | 400 2800 2800.0 2800 level3 | 8000 140 140.0 140 level4 | 160000 7 7.0 7 ----------------------------------------------------------- Wald chi2(1) = 493942.25 Log restricted-likelihood = -1333058.8 Prob > chi2 = 0.0000 ------------------------------------------------------------------------------ y | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- x1 | .4990131 .00071 702.81 0.000 .4976215 .5004048 _cons | -.7875853 .1372061 -5.74 0.000 -1.056504 -.5186663 ------------------------------------------------------------------------------ ------------------------------------------------------------------------------ Random-effects Parameters | Estimate Std. Err. [95% Conf. Interval] -----------------------------+------------------------------------------------ level1: Identity | sd(_cons) | .6027093 .1013493 .433484 .8379976 -----------------------------+------------------------------------------------ level2: Identity | sd(_cons) | .5019323 .0191549 .4657591 .5409149 -----------------------------+------------------------------------------------ level3: Identity | sd(_cons) | .4958665 .0042854 .487538 .5043373 -----------------------------+------------------------------------------------ level4: Identity | sd(_cons) | .5001539 .0011706 .4978648 .5024535 -----------------------------+------------------------------------------------ sd(Residual) | .7069941 .0005102 .7059947 .7079948 ------------------------------------------------------------------------------ LR test vs. linear regression: chi2(4) = 1.0e+06 Prob > chi2 = 0.0000 -----------------------------end xtmixed output------------------------------- * * For searches and help try: * http://www.stata.com/support/faqs/res/findit.html * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

- Prev by Date:
**Re: st: substring help** - Next by Date:
**st: unique values** - Previous by thread:
**st: question about processing time** - Next by thread:
**st: svymean error in oaxaca** - Index(es):

© Copyright 1996–2015 StataCorp LP | Terms of use | Privacy | Contact us | What's new | Site index |