Stata: Data Analysis and Statistical Software

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

st: -areg- and -eststo- performance (page faults?) on OS X Lion

From	James Sams <[email protected]>
To	[email protected]
Subject	st: -areg- and -eststo- performance (page faults?) on OS X Lion
Date	Sun, 01 Jan 2012 23:08:50 -0600

For reference, my environment:
Stata 12 MP
OS X 10.7 Lion
32 GB RAM
4 cores
12 GB dataset (on disk size)

I'm trying to run areg with several different models with several dependent 
variables for 44 groups of unequal size. Theoretically, I should be able to do 
just

by subgroup:  eststo ...: areg ...., absorb(id)

for each model and dependent variable. However, that turns out to being about 
900 eststo's, and Stata only allows 300. So, I have it broken up into a loop 
where it reads in only the subset of the data for the count of  groups that 
would consume all of eststo's available space and loops over the groups one by 
one running areg on each and storing the results (in terms of customizing 
titles and such, this turns out to be more convenient than using "by 
subgroup". Plus given that "by subgroup"). This is fast at first, but stata 
ends up getting slow, very slow. OS X's top indicates a very large number of 
page faults (306,914,721 in about the past 30 hours of continuous running). I 
assume this is the problem but don't really understand it. The memory usage 
looks like this:

RPRVT  RSHRD  RSIZE  VPRVT  VSIZE
8268M  216K   8280M  8351M  10G  

which should be more than enough to find contiguous portions of memory. I am 
using the noesample option to eststo. Sometimes top shows stata as being 
"stuck" with approximately 7% of CPU usage. Other times, it is classified as 
"running and uses 400% of CPU . There is nothing else significant running on 
the box (I'm running from the command line and the system is left at the GUI 
login). I don't have good data on how long it is being left in each state 
except that my process has been running for 30 hours, and based on my 
rudimentary testing I expected it to take in the neighborhood of 12 hours

Speeding this up is important because a) I'll need to run similar regressions 
on these same groups many times and the time it is taking is just too much and 
b) I'd like to run this on a separate classification of groups that would give 
me 1000 groups (so thousands of regressions). This might be faster due to a 
potentially lower amount of memory required for that classification (but that 
is not yet certain).

Any and all advice is appreciated.
-- 
James Sams
[email protected]
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

Prev by Date: Re: st: Detecting a macro expansion error
Next by Date: st: Use of "Goto" command in Mata
Previous by thread: [no subject]
Next by thread: st: Use of "Goto" command in Mata
Index(es):
- Date
- Thread