Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
From | James Sams <sams.james@gmail.com> |
To | statalist@hsphsun2.harvard.edu |
Subject | st: -areg- and -eststo- performance (page faults?) on OS X Lion |
Date | Sun, 01 Jan 2012 23:08:50 -0600 |
For reference, my environment: Stata 12 MP OS X 10.7 Lion 32 GB RAM 4 cores 12 GB dataset (on disk size) I'm trying to run areg with several different models with several dependent variables for 44 groups of unequal size. Theoretically, I should be able to do just by subgroup: eststo ...: areg ...., absorb(id) for each model and dependent variable. However, that turns out to being about 900 eststo's, and Stata only allows 300. So, I have it broken up into a loop where it reads in only the subset of the data for the count of groups that would consume all of eststo's available space and loops over the groups one by one running areg on each and storing the results (in terms of customizing titles and such, this turns out to be more convenient than using "by subgroup". Plus given that "by subgroup"). This is fast at first, but stata ends up getting slow, very slow. OS X's top indicates a very large number of page faults (306,914,721 in about the past 30 hours of continuous running). I assume this is the problem but don't really understand it. The memory usage looks like this: RPRVT RSHRD RSIZE VPRVT VSIZE 8268M 216K 8280M 8351M 10G which should be more than enough to find contiguous portions of memory. I am using the noesample option to eststo. Sometimes top shows stata as being "stuck" with approximately 7% of CPU usage. Other times, it is classified as "running and uses 400% of CPU . There is nothing else significant running on the box (I'm running from the command line and the system is left at the GUI login). I don't have good data on how long it is being left in each state except that my process has been running for 30 hours, and based on my rudimentary testing I expected it to take in the neighborhood of 12 hours Speeding this up is important because a) I'll need to run similar regressions on these same groups many times and the time it is taking is just too much and b) I'd like to run this on a separate classification of groups that would give me 1000 groups (so thousands of regressions). This might be faster due to a potentially lower amount of memory required for that classification (but that is not yet certain). Any and all advice is appreciated. -- James Sams sams.james@gmail.com * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/