Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: RE: Will Stata/MP speed up running multiple dofiles in batch mode?


From   [email protected] (William Gould, StataCorp LP)
To   [email protected]
Subject   Re: st: RE: Will Stata/MP speed up running multiple dofiles in batch mode?
Date   Thu, 20 Jan 2011 12:03:30 -0600

Alex Eapen <[email protected]> wonders about the efficiency of
running multiple copies of Stata simultaneously.  He writes,

> I currently have Stata/SE 10.1 for Mac. I would like to run several
> do-files at once (rather than sequentially). I see that this can be
> done by running Stata in batch mode from, say, the Terminal
> application in Mac OS (see 
> http://www.stata.com/support/faqs/unix/batch.html). That is, at the
> prompt in the Terminal application in a Mac OS, I type:
> 
>            $ statase -b do bigjob1.do &
>
> followed by....
>
>            $ statase -b do bigjob2.do &
> 
> and then...
>
>            $ statase -b do bigjob3.do &
>
> And all three dofiles run simultaneously.

Alex is issuing commands to the underlying Unix operating system of
his Mac.  The dollar signs I've inserted in front of what Alex types
are the Unix prompts. He's typing those commands into Unix.  The
ampersand at the end of the command tells Unix that it is to run the
command in the background rather than waiting until the command
completes before issuing another prompt.

There are other ways Alex could have run three Stata jobs
simultaneously on his Mac.  On other operating systems, there are ways
one could run simultaneous Statas.  The how doesn't matter, Alex's
questions and my answers generalize across methods and operating
systems.  Alex asks,


> 1. My first question is whether there is a limit on how many do files
>    can be run simultaneously in this manner?

Only that imposed by your operating system.  For all practical purposes, 
the answer is no, there is no limit.


Next, Alex notices that running jobs simultaneously results in each job 
taking longer to complete, 

> 2. I compared the time it takes to execute a single dofile alone to
>    when it is run simultaneously with the other two. (I ensured all
>    three dofiles contain exactly the same code, so any difference in
>    time-to-run is not because of different commands in the m). When
>    the three dofiles are running simultaneously, possibly because
>    stata resources on my computer are being stretched across three
>    dofiles, each one takes longer to complete.

First, let's think about this without getting into details of number of 
cores, etc.

Let's say Alex runs the three jobs sequentially.  Let s1 be the time for 
running the first job, s2 the time for the second, and s3 the time for the 
third.  The total time Alex must wait for all three jobs to complete is 
then 

          S = s1 + s2 + s3

Now say Alex runs the three jobs simultaneously.  Let p1 be the time for 
running the first job, p2 the time for the second, and p3 the time 
for running the third.  The total time Alex must wait for all three 
jobs to complete is then

         P = max(p1, p2, p3)

Note that P can be < S even if p1>s1, p2>s2, and p3>s3.  That is the
basis for the often stated claim that running simultaneous processes
can result in more efficient use of the cpu resources.  In fact, in
the old time-sharing literature, the goal was to run enough
simultaneous processes so that P==S.  The computer resources were
being used even more "efficiently" if P>S, but if P>>S, that was
considered inefficient.  When P>>S, the computer was said to be
"thrashing", which is merely saying, that it was spending too much
time switching between jobs rather than running the jobs themselves.

Thus, the rule on running simultaneous jobs is: Do not to run too many
of them simultaneously.  If you do, overall performance will suffer
and P will be greater than S.  Do things right, and you can obtain
P<S.


Next, Alex asks

> Will Stata/MP help in this situation? Will Stata/MP distribute the
> execution of dofiles (or parts of them) across multiple cores and thus
> reduce the time it takes for each to complete?

First, Stata/MP has nothing explicitly to do with running multiple jobs;
Stata/MP is about running single jobs more quickly on multiple cores.  Even
so, Alex asks a good question.

The answers are

    1.  Yes, Stata/MP can reduce execution time if there are more
        cores than there are jobs.

    2.  No if there are fewer cores than there are jobs.

    3.  No if the number of jobs equals the number of cores.

I will explain.

Operating systems are smart when running multiple jobs in a multiple 
core environment:  They assign one job to each core, and only 
after all the cores are assigned do jobs compete for resources. 

Thus, answer (1) is yes because there were unused cores laying around.
As an aside, to obtain maximum performance, we would like the number
of cores to be a multiple of the number of jobs, and to limit Stata/MP
to using just that multiple.  With 3 jobs and 6 cores, we could
arrange things so that job 1 ran on two cores, job 2 ran on another
two, and job 3 ran on yet another two.  If we just let each Stata/MP
spread itself out over all six, there will be competition for
resources and the operating system will have to manage that.

Answer (2) is obvious, or at least will be afater I explain answer (3).

Answer (3) requires some explanation.  

Stata/MP may be the most efficient parallelized statlistical package
available, but even it is not 100% efficient: Running on two cores
will not quite halve run times.  That's because any statistical or
data-management problem has some parts that must be performed
sequentialy, and other parts that can be performed in parallel.
During the parts that have to be performed sequentially, the extra
cores sit idle.

Operating Systems doling out multiple cores tend to be nearly 100%
efficient when running single-core processes.  One
process does not depend on the other, so if each is given its own core,
it can just blast away.  The cores are idle only when they are
waiting for a shared resource such as an I/O line.

Thus, Stata/MP cannot be 100% efficient, operating systems nearly are 
100% efficient, and so, when the number of cores equals the number of 
jobs, letting the operating system handle the simultaneity is better 
than asking Stata/MP to do it.

This result should hardly surprise you.  Alex started by asking us 
about three jobs.  What's a job?  I can think of the three jobs 
as one job, 

        job_c = job1.do + job2.do + job3.do 

        that is, edit job_c.do, copy in job1.do, then add to the file 
        job2.do, and finally add job3.do to make one combined do-file.

Can Stata/MP run job_c.do faster than Stata/SE?  Certainly.

Now consider partitioning job_c.do into {job1.do, job2.do, job3.do}, 
each independent of the other.  Being independent, each can be run 
in parallel with theoretical efficiency of 100%.  Stata/MP cannot 
achieve 100% efficiency.  Thus, using three seprate Stata/MP sessions 
must result in the run time being reduced.  Could you use Stata/SE
instead?  Yes, if you had three or fewer cores, because in that 
case you can only use one core per job.  If you had more cores, 
using Stata/MP to run the separate jobs would be quicker.

By the way, see 

        http://www.stata.com/statamp/

for more information about Stata/MP and efficiency.

-- Bill
[email protected]
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index