Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
From | James Sams <sams.james@gmail.com> |
To | statalist@hsphsun2.harvard.edu |
Subject | st: compressed data and named pipes on linux |
Date | Sat, 18 Aug 2012 18:46:20 -0500 |
I wish to compress my stata datasets and use them via a named pipe. I was hoping that the gz* tools would do this, but it appears that they make use of a temporary file and write the entire compressed file out then read it in, slowing things down pretty significantly with the extra hard drive read/write cycle. My first instinct was to use a named pipe, and it seems like this should work according to past messages to the stata list and a stata FAQ. However, with stata 12 mp, I am getting a core dump. Can someone tell me where I am going wrong? unzip.sh: #!/bin/bash gzip_file="$1" pipe_name="$2" if [ -e "$pipe_name" ]; then rm "$pipe_name" fi mknod "$pipe_name" p zcat "$1" > "$pipe_name" & usage in stata: local gzip_file 2667.dta.gz tempfile pipe_file shell ./unzip.sh `gzip_file' `pipe_file' >& /dev/null < /dev/null use `pipe_file' shell rm `pipe_file' Doing this manually also results in a core dump: $ mkfifo tempfile $ zcat 2667.dta.gz > tempfile & $ stata -q use tempfile. Segmentation fault (core dumped) FWIW, Doing this works just fine: $ mkfifo tempfile $ zcat 2667.dta.gz > tempfile & $ cat tempfile > 2667.dta && stata -q use 2667.dta $ stata -q use 2667.dta -- James Sams sams.james@gmail.com * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/