Stata: Data Analysis and Statistical Software

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: MP running no faster than IC

From	David Muller <[email protected]>
To	[email protected]
Subject	Re: st: MP running no faster than IC
Date	Tue, 9 Jul 2013 17:30:29 +0200

You are obviously and understandably disappointed that the performance
of Stata MP doesn't meet your expectations. No doubt I would feel the
same in your situation. That said, I think your characterisation of
Stata's 'sales pitch' is a little unfair, given that the issues raised
here are covered in the Stata/MP performance report, which is
available for download directly from the Stata/MP webpage
(http://www.stata.com/statamp/).

On 9 July 2013 16:56, Ted Player <[email protected]> wrote:
> Why would I want a 2-second estimation process to be shortened to a .1
> estimation process?  Because some of my work is with Monte Carlo
> simulations that take days to run.  I am disappointed to find they are
> not quicker at all with MP.
>
> Based on the comments here and other reading I've done, it looks like
> the parallelization procedures that Stata uses don't help anyone
> except users who run really complex models on huge datasets.  It would
> have been nice if Stata had said that up front (e.g., in their sales
> materials alongside their glowing claims of increased speed for
> everyone) rather than mentioning it parenthetically in obscure blog
> posts.
>
> Given Stata's sales pitch, I'm forced to wonder how many people have
> taken the biggest hit they could afford from their research budget to
> buy MP-2 core rather than SE, run it without any comparison to SE (or
> set processors 1), and never have any idea that the extra money that
> they spent isn't giving them any real benefit whatsoever. It's a pity.
>
> On 7/8/13, Lucas <[email protected]> wrote:
>> Stata isn't over-claiming.  They just probably never though that
>> someone running a command that takes 2 seconds would be seeking to run
>> it even faster.  My jobs, and jobs of other people I know, routinely
>> run days or weeks.  (And, yes, it is identified, everything checks
>> out, it is just the data is massive and the model appropriately
>> complex).  It is for such jobs that one needs parallel processing.
>> Running the same 2 second command 500 times can't be parallelized with
>> any efficiency because the overhead of managing the allocation of
>> tasks swamps any gains attributable to parallelization.  Stata's only
>> fault--if fault it be--is not making clear that unless one uses big
>> data or finds oneself in situations that one model takes days or
>> weeks, MP is of dubious value.  But, on the other hand, users seeking
>> to run a regression model in .1 second rather than 2 seconds only
>> inspire one to ask, "Why?"
>>
>> On Mon, Jul 8, 2013 at 9:09 PM, Ted Player <[email protected]>
>> wrote:
>>> The benchmark tests I originally described were conducted on a local
>>> machine.  I did a follow-up with an EC2 machine (as described
>>> elsewhere in this thread).
>>>
>>> I see now that buried on p. 231 of Stata's MP performance report is
>>> the mention that to get the improvements that Stata claims for
>>> regression requires a single regression model with 180 regressors and
>>> a dataset with 1,500,000 observations.  I usually do things like
>>> bootstrap analyses on datasets with 500 observations, so I guess MP
>>> isn't any more useful to me than SE.
>>>
>>> It looks like I fell for the advertising hype on
>>> http://www.stata.com/statamp .  It's my fault for thinking Stata
>>> wouldn't overclaim to make their software seem better than it really
>>> is.  Live and learn I guess!
>>>
>>>
>>>
>>> On Mon, Jul 8, 2013 at 9:42 PM, Sergiy Radyakin <[email protected]>
>>> wrote:
>>>> Dear Ted, I've witnessed many times that MP works much faster the IC.
>>>> The figures in the report do make sense. No looking at your example:
>>>> the only parallelizable part here is the "regress mpg weight gear
>>>> foreign." Two things to notice immediately are the following:
>>>>
>>>> 1) the dataset contains 74 observations. The overhead of parallelizing
>>>> it into 12 CPUs or even 4 CPUs is large relative to the size of the
>>>> task at hand. You are likely to see the benefits of parallelization
>>>> when you -expand- your dataset, say 1000000 (10^6) times and perhaps
>>>> reduce the number of bootstrap iterations.
>>>>
>>>> 2) the dataset contains 74 observations. So the _regress command
>>>> (internal) takes, say, 0.00001second and with parallelization takes
>>>> may be 0.000001 second, but then you have 2 seconds of writing the
>>>> output to the screen and scrolling the output window.  That is not
>>>> parallelized (correct me if I am wrong), though scrolling seems to
>>>> work much faster in recent versions (THANKS!) So, try disabling the
>>>> output with -quietly- and you will see more performance gain from MP.
>>>>
>>>> 3) finally, Stata's ado files seem to not be parallelizable (you don't
>>>> write them that way), but only internal commands are. There have been
>>>> some changes in the most recent versions and the idea is to permit the
>>>> users to write parallel code. I am yet to see these facilities, but it
>>>> makes no sense to test parallelization benefits on do/ado code or
>>>> where such code executes for a significant amount of time. This is
>>>> also a reason while there is no need to separately benchmark bootstrap
>>>> commands.
>>>>
>>>> To summarize the above, try the following commands on LARGE datasets
>>>> (occupy e.g. half of your memory with data):
>>>> mlogit - you should see performance increase about 3 times on a 12
>>>> CPUs MP vs 1CPU IC.
>>>> summarize - you should see about 11-fold performance increase on a
>>>> 12CPUs MP vs 1CPU IC
>>>>
>>>> Run tests on a local machine. Perhaps it's the Amazon that is to blame
>>>> (I don't mean it). Some hosters limit your TOTAL computing power, so
>>>> you can get 128 cores with the same total performance as 1 core. Then
>>>> you are better of with a single CPU license of course :)
>>>>
>>>> Hope this helps.
>>>> Best, Sergiy Radyakin
>>>>
>>> *
>>> *   For searches and help try:
>>> *   http://www.stata.com/help.cgi?search
>>> *   http://www.stata.com/support/faqs/resources/statalist-faq/
>>> *   http://www.ats.ucla.edu/stat/stata/
>> *
>> *   For searches and help try:
>> *   http://www.stata.com/help.cgi?search
>> *   http://www.stata.com/support/faqs/resources/statalist-faq/
>> *   http://www.ats.ucla.edu/stat/stata/
>>
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/faqs/resources/statalist-faq/
> *   http://www.ats.ucla.edu/stat/stata/
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/

References:
- st: MP running no faster than IC
  - From: Ted Player <[email protected]>
- Re: st: MP running no faster than IC
  - From: Sergiy Radyakin <[email protected]>
- Re: st: MP running no faster than IC
  - From: Ted Player <[email protected]>
- Re: st: MP running no faster than IC
  - From: Lucas <[email protected]>
- Re: st: MP running no faster than IC
  - From: Ted Player <[email protected]>

Prev by Date: Re: st: MP running no faster than IC
Next by Date: Re: st: Sorting data in deciles and then regressing and storing coefficients. (Looping)
Previous by thread: Re: st: MP running no faster than IC
Next by thread: RE: st: MP running no faster than IC
Index(es):
- Date
- Thread