Bookmark and Share

Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down on April 23, and its replacement, is already up and running.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: compressed data and named pipes on linux

From   Daniel Feenberg <>
Subject   Re: st: compressed data and named pipes on linux
Date   Sat, 8 Sep 2012 09:42:29 -0400 (EDT)

On Fri, 7 Sep 2012, James Sams wrote:

On Sunday 19,  August  2012 17:24:42 you wrote:
If you do find out what is causing the segmentation fault, I hope you will
post the information or a workaround here.

After some extended back and forth with Stata's tech support, it turns out,
according to a senior programmer, that `use' has changed since prior documents
referencing the ability to use `use' with a pipe. It now does random seeks
through the file, which as Daniel mentioned, is not possible on a pipe. This is
of course highly disappointing. I tried with pretty simple files lacking value
labels and such hoping that maybe that was what the seeks were being used for.
However, this did not work. Apparently they have officially written off the
ability to do this. This is of course highly disappointing given how quickly
datasets grow now, but there it is.


I would be extremely disappointed if the Stata -use- statements could no
longer read from a pipe, as is reported above. However, this simple test
program does work in Stata-SE 12.1 running under Scientific Linux version 6,
so I wonder if the report is entirely accurate:

  ! /bin/rm mypipe.dta pipe.dta pipe.dta.gz
  set obs 10000000
  gen a=_n
  save pipe
  ! gzip pipe.dta

  ! mknod mypipe.dta p
  ! zcat pipe.dta.gz >> mypipe.dta &
  use mypipe

I have been planning for some time to move 20 TB of Medicare data from SAS
to .dta.gz files, and it would be a disapointment to learn that users could
not read from the compressed files. While reading from a compressed file is
slower than from the equivalent uncompressed file, it is much faster than
decompressing the file and reading the result, even if the disk space for
the decompressed file is available.

Daniel Feenberg

*   For searches and help try:

© Copyright 1996–2016 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   Site index