Stata: Data Analysis and Statistical Software

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: RE: Efficient parallel computing in Stata/MP

From	Demian Panigo <[email protected]>
To	[email protected]
Subject	Re: st: RE: Efficient parallel computing in Stata/MP
Date	Fri, 27 Sep 2013 09:12:40 -0300

Excuse me... one more point.
When I say many regressions..... many is about one hundred million.
Thanks again
Demian

2013/9/27 Demian Panigo <[email protected]>:
> Thank you very much Daniel:
> Just one more question.
> You finally used 24 cores (cores and hypercores) to run 3 parallel
> Stata MP/8 jobs with interesting time saving outcomes.
> But, did you compared these results with those obtained by just run 24
> parallel Stata SE jobs in, for example, batch mode?
> In other words, if my problem has a lot parallelizable tasks (e.g
> many independent linear regressions) and they must be performed on a
> small database (e.g. 50 variables with 1000 observations each) using
> an 8-core CPU (in my University there are more powerfull servers but
> not always available), should I rely on a single Stata/MP8 instance, 2
> Stata/MP4 parallel instances (with a proper rewritten code) or 8
> Stata/SE instances?
> Which is better?
> Thanks in advance
> Demian
>
>
>
> 2013/9/27 Daniel Feenberg <[email protected]>:
>>
>>> -----Original Message-----
>>> From: [email protected]
>>> [mailto:[email protected]] On Behalf Of Demian Panigo
>>> Sent: 27 September 2013 01:02
>>> To: [email protected]
>>> Subject: st: Efficient parallel computing in Stata/MP
>>>
>>> Dear Statalist members: I need some help, because I'm not sure about
>>> some Stata/MP properties for parallel computing.
>>> We know from http://www.stata.com/statamp/statamp.pdf that many
>>> estimation commands (e.g. regress) are almost fully parallelizable and
>>> that average efficiency for all commands is around 72%. So, in
>>> standard linear regression problems (e.g running one million equations
>>> for parameter stability analysis), using Stata/MP in a multiple-core
>>> CPU would be an optimal time saving strategy.
>>> However, it is also possible to exploit the multi-core CPU environment
>>> by working with multiple parallel Stata/MP instances (e.g. using 4
>>> parallel Stata/MP instances to run 250.000 linear regressions with
>>> each core).
>>> My question is simple.... Can I save some time by using this "dual
>>> parallelism" methodology? (because parallel computing is
>>> authomatically used by Stata/MP to parallelize internal tasks of, for
>>> example, regress; and because I also parallelize the whole set of
>>> regressions between 4 cores, by means of multiple Stata/MP instances).
>>> Thanks in advance
>>> *
>>
>>
>> In my experience, Stata/MP fully exploits as many real cores as are
>> available, very efficiently for regression commands. If you have hypercores,
>> running multiple Stata jobs will exploit those efficiently also. I posted
>> the results of a simple experiment at:
>>
>>   http://www.nber.org/stata/efficient
>>
>> under heading "Stata/MP".
>>
>> -parallel.ado- is a very interesting routine. It will start up multiple
>> Stata processes and let each one read a part of the dataset, then combine
>> the results into a single dataset. For processes that are single-threaded
>> for no good reason, or if you don't have Stata/MP, it seems like a great
>> idea. I believe it will also work well with hyper-cores, but I have no
>> experience with it. But for I/O it would just make things worse, since each
>> thread has to read the entire dataset.
>>
>> See
>>
>>   http://www.stata.com/statamp/report.pdf
>>
>> for a more discouraging report on hyper-cores. I don't have an explanation
>> for the difference in experiences. There is no substitute for
>> experimentation on your actual hardware, and there would be interest on this
>> list in your experience.
>>
>> Daniel Feenberg
>> NBER
>>
>> *
>> *   For searches and help try:
>> *   http://www.stata.com/help.cgi?search
>> *   http://www.stata.com/support/faqs/resources/statalist-faq/
>> *   http://www.ats.ucla.edu/stat/stata/
>
>
>
> --
> Demian T. Panigo
> Lic. en Economía, UNLP,
> Master en Cs Sociales, UBA,
> Doctor en Economía, EHESS-ENS (Paris)
> Investigador Adjunto del CEIL-PIETTE del CONICET
> Docente investigador de la UNM, de la UNLP, de la UBA, y de
> Paris-Jourdan Sciences Economiques-ENS.
> Miembro del Programa de Formación Popular en Economía (PROFOPE)



-- 
Demian T. Panigo
Lic. en Economía, UNLP,
Master en Cs Sociales, UBA,
Doctor en Economía, EHESS-ENS (Paris)
Investigador Adjunto del CEIL-PIETTE del CONICET
Docente investigador de la UNM, de la UNLP, de la UBA, y de
Paris-Jourdan Sciences Economiques-ENS.
Miembro del Programa de Formación Popular en Economía (PROFOPE)

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/

Follow-Ups:
- Re: st: RE: Efficient parallel computing in Stata/MP
  - From: Stas Kolenikov <[email protected]>
- Re: st: RE: Efficient parallel computing in Stata/MP
  - From: Daniel Feenberg <[email protected]>

References:
- st: Efficient parallel computing in Stata/MP
  - From: Demian Panigo <[email protected]>
- st: RE: Efficient parallel computing in Stata/MP
  - From: Timothy Mak <[email protected]>
- st: RE: Efficient parallel computing in Stata/MP
  - From: Daniel Feenberg <[email protected]>
- Re: st: RE: Efficient parallel computing in Stata/MP
  - From: Demian Panigo <[email protected]>

Prev by Date: st: FW: RE: Looping gen sequentially over datasets
Next by Date: Re: st: RE: Efficient parallel computing in Stata/MP
Previous by thread: Re: st: RE: Efficient parallel computing in Stata/MP
Next by thread: Re: st: RE: Efficient parallel computing in Stata/MP
Index(es):
- Date
- Thread