Stata: Data Analysis and Statistical Software

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: Re: insheet multi threading

From	Argyn Kuketayev <[email protected]>
To	[email protected]
Subject	Re: st: Re: insheet multi threading
Date	Mon, 2 May 2011 14:18:58 -0400

Nick

On Mon, May 2, 2011 at 12:46 PM, Nick Cox <[email protected]> wrote:
>
> On the main issue, my short answer is that I don't know.

Iike Mark said, I'm not aware of a Stata API which would allow me to
write concurrently executed ado's. I don't think it can be done.

>
> A longer answer is that -insheet- depends on parts of a datafile
> having the same structure as the whole. So, very likely -- if this is
> parallelisable -- much of the code would require lots of compatibility
> checks to ensure consistency of input.
>
> My guess is that -insheet- peeks at the top of the data file, makes a
> guess at its structure, and then keeps on going unless and until it
> finds a problem.
>

I've written parsing utilities a few times, and they can be
parallelized, that's why I'm so confident in my disappointment with
Stata. the standard way of handling this task is to write a sequential
reader, which simply reads from the disk then dumps lines into a
queue. Then concurrent parsers pick bunches of lines, and parse them,
and dump the parsed observations into another queue, where something
will aggregate the observations into a data set.

If the disk reading part was a bottleneck, then I wouldn't see 100%
CPU load on one core, and there would be other symptoms pointing to
this situation. At the moment it looks to me that reading and parsing
are sequential, and that parsing is the bottleneck, which is a waste
of CPUs. I have 8 cores, and want them all be used. Reasonable
request, one would think.

cheers
-- 
Argyn Kuketayev
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

References:
- st: insheet multi threading
  - From: Argyn Kuketayev <[email protected]>
- st: Re: insheet multi threading
  - From: "Joseph Coveney" <[email protected]>
- Re: st: Re: insheet multi threading
  - From: Argyn Kuketayev <[email protected]>
- Re: st: Re: insheet multi threading
  - From: Maarten buis <[email protected]>
- RE: st: Re: insheet multi threading
  - From: "Schaffer, Mark E" <[email protected]>
- Re: st: Re: insheet multi threading
  - From: Nick Cox <[email protected]>

Prev by Date: Re: st: specifying marker labels in twoway scatter
Next by Date: Re: st: c(mode) with console Stata on OS X
Previous by thread: Re: st: Re: insheet multi threading
Next by thread: Re: st: Re: insheet multi threading
Index(es):
- Date
- Thread