Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: st: Re: insheet multi threading
From
Argyn Kuketayev <[email protected]>
To
[email protected]
Subject
Re: st: Re: insheet multi threading
Date
Mon, 2 May 2011 14:18:58 -0400
Nick
On Mon, May 2, 2011 at 12:46 PM, Nick Cox <[email protected]> wrote:
>
> On the main issue, my short answer is that I don't know.
Iike Mark said, I'm not aware of a Stata API which would allow me to
write concurrently executed ado's. I don't think it can be done.
>
> A longer answer is that -insheet- depends on parts of a datafile
> having the same structure as the whole. So, very likely -- if this is
> parallelisable -- much of the code would require lots of compatibility
> checks to ensure consistency of input.
>
> My guess is that -insheet- peeks at the top of the data file, makes a
> guess at its structure, and then keeps on going unless and until it
> finds a problem.
>
I've written parsing utilities a few times, and they can be
parallelized, that's why I'm so confident in my disappointment with
Stata. the standard way of handling this task is to write a sequential
reader, which simply reads from the disk then dumps lines into a
queue. Then concurrent parsers pick bunches of lines, and parse them,
and dump the parsed observations into another queue, where something
will aggregate the observations into a data set.
If the disk reading part was a bottleneck, then I wouldn't see 100%
CPU load on one core, and there would be other symptoms pointing to
this situation. At the moment it looks to me that reading and parsing
are sequential, and that parsing is the bottleneck, which is a waste
of CPUs. I have 8 cores, and want them all be used. Reasonable
request, one would think.
cheers
--
Argyn Kuketayev
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/