Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: MP running no faster than IC


From   Ted Player <ted.player.660@gmail.com>
To   statalist@hsphsun2.harvard.edu
Subject   Re: st: MP running no faster than IC
Date   Mon, 8 Jul 2013 22:09:28 -0600

The benchmark tests I originally described were conducted on a local
machine.  I did a follow-up with an EC2 machine (as described
elsewhere in this thread).

I see now that buried on p. 231 of Stata's MP performance report is
the mention that to get the improvements that Stata claims for
regression requires a single regression model with 180 regressors and
a dataset with 1,500,000 observations.  I usually do things like
bootstrap analyses on datasets with 500 observations, so I guess MP
isn't any more useful to me than SE.

It looks like I fell for the advertising hype on
http://www.stata.com/statamp .  It's my fault for thinking Stata
wouldn't overclaim to make their software seem better than it really
is.  Live and learn I guess!



On Mon, Jul 8, 2013 at 9:42 PM, Sergiy Radyakin <serjradyakin@gmail.com> wrote:
> Dear Ted, I've witnessed many times that MP works much faster the IC.
> The figures in the report do make sense. No looking at your example:
> the only parallelizable part here is the "regress mpg weight gear
> foreign." Two things to notice immediately are the following:
>
> 1) the dataset contains 74 observations. The overhead of parallelizing
> it into 12 CPUs or even 4 CPUs is large relative to the size of the
> task at hand. You are likely to see the benefits of parallelization
> when you -expand- your dataset, say 1000000 (10^6) times and perhaps
> reduce the number of bootstrap iterations.
>
> 2) the dataset contains 74 observations. So the _regress command
> (internal) takes, say, 0.00001second and with parallelization takes
> may be 0.000001 second, but then you have 2 seconds of writing the
> output to the screen and scrolling the output window.  That is not
> parallelized (correct me if I am wrong), though scrolling seems to
> work much faster in recent versions (THANKS!) So, try disabling the
> output with -quietly- and you will see more performance gain from MP.
>
> 3) finally, Stata's ado files seem to not be parallelizable (you don't
> write them that way), but only internal commands are. There have been
> some changes in the most recent versions and the idea is to permit the
> users to write parallel code. I am yet to see these facilities, but it
> makes no sense to test parallelization benefits on do/ado code or
> where such code executes for a significant amount of time. This is
> also a reason while there is no need to separately benchmark bootstrap
> commands.
>
> To summarize the above, try the following commands on LARGE datasets
> (occupy e.g. half of your memory with data):
> mlogit - you should see performance increase about 3 times on a 12
> CPUs MP vs 1CPU IC.
> summarize - you should see about 11-fold performance increase on a
> 12CPUs MP vs 1CPU IC
>
> Run tests on a local machine. Perhaps it's the Amazon that is to blame
> (I don't mean it). Some hosters limit your TOTAL computing power, so
> you can get 128 cores with the same total performance as 1 core. Then
> you are better of with a single CPU license of course :)
>
> Hope this helps.
> Best, Sergiy Radyakin
>
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index