Stata: Data Analysis and Statistical Software

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: compressed data and named pipes on linux

From	Daniel Feenberg <[email protected]>
To	[email protected]
Subject	Re: st: compressed data and named pipes on linux
Date	Sat, 8 Sep 2012 09:42:29 -0400 (EDT)

On Fri, 7 Sep 2012, James Sams wrote:

On Sunday 19,  August  2012 17:24:42 you wrote:

If you do find out what is causing the segmentation fault, I hope you will
post the information or a workaround here.


After some extended back and forth with Stata's tech support, it turns out,
according to a senior programmer, that `use' has changed since prior documents
referencing the ability to use `use' with a pipe. It now does random seeks
through the file, which as Daniel mentioned, is not possible on a pipe. This is
of course highly disappointing. I tried with pretty simple files lacking value
labels and such hoping that maybe that was what the seeks were being used for.
However, this did not work. Apparently they have officially written off the
ability to do this. This is of course highly disappointing given how quickly
datasets grow now, but there it is.

--


I would be extremely disappointed if the Stata -use- statements could no
longer read from a pipe, as is reported above. However, this simple test
program does work in Stata-SE 12.1 running under Scientific Linux version 6,
so I wonder if the report is entirely accurate:

  ! /bin/rm mypipe.dta pipe.dta pipe.dta.gz
  set obs 10000000
  gen a=_n
  save pipe
  ! gzip pipe.dta

  ! mknod mypipe.dta p
  ! zcat pipe.dta.gz >> mypipe.dta &
  use mypipe
  summarize

I have been planning for some time to move 20 TB of Medicare data from SAS
to .dta.gz files, and it would be a disapointment to learn that users could
not read from the compressed files. While reading from a compressed file is
slower than from the equivalent uncompressed file, it is much faster than
decompressing the file and reading the result, even if the disk space for
the decompressed file is available.

Daniel Feenberg
NBER


*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

References:
- Re: st: compressed data and named pipes on linux
  - From: James Sams <[email protected]>

Prev by Date: Re: st: Selecting variables based on variable labels
Next by Date: Re: st: stcox and weighting
Previous by thread: Re: st: compressed data and named pipes on linux
Next by thread: st: Selecting variables based on variable labels
Index(es):
- Date
- Thread