Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: st: MP running no faster than IC

From   Timothy Mak <>
To   "" <>
Subject   RE: st: MP running no faster than IC
Date   Tue, 9 Jul 2013 14:27:30 +0800


If you want to speed up your bootstrapping, you can always try running your bootstrap across several different instances of Stata (which doesn't require MP) and combining the results together. 
See -net describe parallel, from( for a program that might facilitate your doing this. 

I know this may not ease your sense of disappointment for having purchased Stata MP unnecessarily. 

-----Original Message-----
From: [] On Behalf Of Sergiy Radyakin
Sent: 09 July 2013 12:59
Subject: Re: st: MP running no faster than IC

I guess that was made very specific by Vince Wiggins in the Stata's blog:
"I will admit right now that I mostly run Stata interactively using
the auto dataset, which has 74 observations. I run Stata/MP using all
4 cores of my quad-core computer, but I am mostly wasting 3 of them -
there is no speeding up the computations on 74 observations. "
See here for details:

And by Bill Gould:
"In any case, I have seen articles predicting and in some cases,
announcing, computers with hundreds of cores. For applications with p
approaching 1, those are exciting announcements. In the world of
statistical software, however, these announcements are exciting only
for those running with immense datasets."
See here for details:

Give it a second thought, give it some time, in the morning call the
Stata sales department, explain the situation, see if you can work out
a solution together. StataCorp does have a money back guarantee:
though I've never heard of anybody be so disappointed to actually use it.

Best, Sergiy Radyakin

On Tue, Jul 9, 2013 at 12:37 AM, Lucas <> wrote:
> Stata isn't over-claiming.  They just probably never though that
> someone running a command that takes 2 seconds would be seeking to run
> it even faster.  My jobs, and jobs of other people I know, routinely
> run days or weeks.  (And, yes, it is identified, everything checks
> out, it is just the data is massive and the model appropriately
> complex).  It is for such jobs that one needs parallel processing.
> Running the same 2 second command 500 times can't be parallelized with
> any efficiency because the overhead of managing the allocation of
> tasks swamps any gains attributable to parallelization.  Stata's only
> fault--if fault it be--is not making clear that unless one uses big
> data or finds oneself in situations that one model takes days or
> weeks, MP is of dubious value.  But, on the other hand, users seeking
> to run a regression model in .1 second rather than 2 seconds only
> inspire one to ask, "Why?"
> On Mon, Jul 8, 2013 at 9:09 PM, Ted Player <> wrote:
>> The benchmark tests I originally described were conducted on a local
>> machine.  I did a follow-up with an EC2 machine (as described
>> elsewhere in this thread).
>> I see now that buried on p. 231 of Stata's MP performance report is
>> the mention that to get the improvements that Stata claims for
>> regression requires a single regression model with 180 regressors and
>> a dataset with 1,500,000 observations.  I usually do things like
>> bootstrap analyses on datasets with 500 observations, so I guess MP
>> isn't any more useful to me than SE.
>> It looks like I fell for the advertising hype on
>> .  It's my fault for thinking Stata
>> wouldn't overclaim to make their software seem better than it really
>> is.  Live and learn I guess!

^^^ that's true, and the best opportunity to learn the new Stata
features is definitely the Stata user group meetings and conferences.
That's where you can see the software working and talk to the
developers for the best advice in your situation, whether you are
configuring a megamonster-machine to run Stata or looking for a best
estimation strategy.

>> On Mon, Jul 8, 2013 at 9:42 PM, Sergiy Radyakin <> wrote:
>>> Dear Ted, I've witnessed many times that MP works much faster the IC.
>>> The figures in the report do make sense. No looking at your example:
>>> the only parallelizable part here is the "regress mpg weight gear
>>> foreign." Two things to notice immediately are the following:
>>> 1) the dataset contains 74 observations. The overhead of parallelizing
>>> it into 12 CPUs or even 4 CPUs is large relative to the size of the
>>> task at hand. You are likely to see the benefits of parallelization
>>> when you -expand- your dataset, say 1000000 (10^6) times and perhaps
>>> reduce the number of bootstrap iterations.
>>> 2) the dataset contains 74 observations. So the _regress command
>>> (internal) takes, say, 0.00001second and with parallelization takes
>>> may be 0.000001 second, but then you have 2 seconds of writing the
>>> output to the screen and scrolling the output window.  That is not
>>> parallelized (correct me if I am wrong), though scrolling seems to
>>> work much faster in recent versions (THANKS!) So, try disabling the
>>> output with -quietly- and you will see more performance gain from MP.
>>> 3) finally, Stata's ado files seem to not be parallelizable (you don't
>>> write them that way), but only internal commands are. There have been
>>> some changes in the most recent versions and the idea is to permit the
>>> users to write parallel code. I am yet to see these facilities, but it
>>> makes no sense to test parallelization benefits on do/ado code or
>>> where such code executes for a significant amount of time. This is
>>> also a reason while there is no need to separately benchmark bootstrap
>>> commands.
>>> To summarize the above, try the following commands on LARGE datasets
>>> (occupy e.g. half of your memory with data):
>>> mlogit - you should see performance increase about 3 times on a 12
>>> CPUs MP vs 1CPU IC.
>>> summarize - you should see about 11-fold performance increase on a
>>> 12CPUs MP vs 1CPU IC
>>> Run tests on a local machine. Perhaps it's the Amazon that is to blame
>>> (I don't mean it). Some hosters limit your TOTAL computing power, so
>>> you can get 128 cores with the same total performance as 1 core. Then
>>> you are better of with a single CPU license of course :)
>>> Hope this helps.
>>> Best, Sergiy Radyakin
>> *
>> *   For searches and help try:
>> *
>> *
>> *
> *
> *   For searches and help try:
> *
> *
> *

*   For searches and help try:

*   For searches and help try:

© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index