Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
From | Daniel Feenberg <feenberg@nber.org> |
To | statalist@hsphsun2.harvard.edu |
Subject | Re: st: compressed data and named pipes on linux |
Date | Sat, 8 Sep 2012 09:42:29 -0400 (EDT) |
On Fri, 7 Sep 2012, James Sams wrote:
On Sunday 19, August 2012 17:24:42 you wrote:If you do find out what is causing the segmentation fault, I hope you will post the information or a workaround here.After some extended back and forth with Stata's tech support, it turns out, according to a senior programmer, that `use' has changed since prior documents referencing the ability to use `use' with a pipe. It now does random seeks through the file, which as Daniel mentioned, is not possible on a pipe. This is of course highly disappointing. I tried with pretty simple files lacking value labels and such hoping that maybe that was what the seeks were being used for. However, this did not work. Apparently they have officially written off the ability to do this. This is of course highly disappointing given how quickly datasets grow now, but there it is. --
I would be extremely disappointed if the Stata -use- statements could no longer read from a pipe, as is reported above. However, this simple test program does work in Stata-SE 12.1 running under Scientific Linux version 6, so I wonder if the report is entirely accurate: ! /bin/rm mypipe.dta pipe.dta pipe.dta.gz set obs 10000000 gen a=_n save pipe ! gzip pipe.dta ! mknod mypipe.dta p ! zcat pipe.dta.gz >> mypipe.dta & use mypipe summarize I have been planning for some time to move 20 TB of Medicare data from SAS to .dta.gz files, and it would be a disapointment to learn that users could not read from the compressed files. While reading from a compressed file is slower than from the equivalent uncompressed file, it is much faster than decompressing the file and reading the result, even if the disk space for the decompressed file is available. Daniel Feenberg NBER * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/