Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: Re: insheet multi threading

From   Argyn Kuketayev <>
Subject   Re: st: Re: insheet multi threading
Date   Mon, 2 May 2011 14:18:58 -0400


On Mon, May 2, 2011 at 12:46 PM, Nick Cox <> wrote:
> On the main issue, my short answer is that I don't know.

Iike Mark said, I'm not aware of a Stata API which would allow me to
write concurrently executed ado's. I don't think it can be done.

> A longer answer is that -insheet- depends on parts of a datafile
> having the same structure as the whole. So, very likely -- if this is
> parallelisable -- much of the code would require lots of compatibility
> checks to ensure consistency of input.
> My guess is that -insheet- peeks at the top of the data file, makes a
> guess at its structure, and then keeps on going unless and until it
> finds a problem.

I've written parsing utilities a few times, and they can be
parallelized, that's why I'm so confident in my disappointment with
Stata. the standard way of handling this task is to write a sequential
reader, which simply reads from the disk then dumps lines into a
queue. Then concurrent parsers pick bunches of lines, and parse them,
and dump the parsed observations into another queue, where something
will aggregate the observations into a data set.

If the disk reading part was a bottleneck, then I wouldn't see 100%
CPU load on one core, and there would be other symptoms pointing to
this situation. At the moment it looks to me that reading and parsing
are sequential, and that parsing is the bottleneck, which is a waste
of CPUs. I have 8 cores, and want them all be used. Reasonable
request, one would think.

Argyn Kuketayev
*   For searches and help try:

© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index