Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
st: -use- from a compressed file
From 
 
Daniel Feenberg <[email protected]> 
To 
 
[email protected] 
Subject 
 
st: -use- from a compressed file 
Date 
 
Thu, 18 Aug 2011 16:06:28 -0400 (EDT) 
The Stata knowledge base includes a note on reading ASCII data from a 
pipe, which would allow one to read a file without storing the 
decompressed version on disk. We have never had success with the method 
shown there - I always get the error message "mypipe.pip: not found". We 
have terabytes of data that compresses very well, so this was always a 
disappointment. We'd be interested in hearing if it works for anyone else.
While investigating this we found a work-around that seems much better. 
Unlike the knowledge base suggestion, it will work with .dta files in 
addition to ASCII files.  This is very much more interesting to us. This 
is done with the http option of the -use- command.
Our first try was to add the file test.cgi to our webservers cgi-bin 
directory:
   #!/bin/sh
   echo Content-type: application/x-stata
   /usr/bin/zcat /data/sample.dta.gz
and we find that
  use http://www.nber.org/test
works from Stata but this involved a lot of overhead as the file whipped 
around the LAN several times, so we haven't pursued taking the file name 
from the URL or otherwise making this practical.
We are developing an alternative that doesn't require an actual webserver, 
or even root permissions.  This is done with the nc command which ships 
with most Linux distributions and is available for windows also. At the 
Stata prompt run the compound command:
.! (echo -ne "HTTP/1.0 200 OK\r\n\r\n"; zcat /data/sample.dta.gz;) | nc -l 8080 &
This command sets up the computer to transmit a header and the 
decompressed file to the first process that reads from port 8080. Since 
8080 is a high port, no special permission is required to use it. This 
command won't return till the file is read from that port, when it will 
show you the exact Stata request. Because of the & Stata continues while 
nc waits. Then
. use http://127.0.0.1:8080
Note that you can't use "localhost" instead of 127.0.0.1 because the -use- 
command won't accept one-part host names.
If there is no nc on your machine, look for ncat, netcat or socat. Some 
versions will require a '-p' before the port number. You can install nc on 
a Windows machine and should be able to do the same thing, but we haven't 
tried it.
This could also be used for ascii files, encrypted files, split files, and 
perhaps other types. If only Stat/Transfer would write to the standard 
output!
There is a security issue - you give up  the read restrictions in the Unix 
permission bits. It is also slower than reading the uncompressed file from 
disk, but still fast enough for us.
We have been trying to package this into an ado file, but without much 
success, since a user-friendly ado program would need to find an available 
port by itself, which we haven't seen a good way to do yet, and to 
communicate it back to the use command, for which we are also at a loss. I 
was hoping someone on the list might be inspired to suggest a method or 
that Statacorp might just incorporate decompression into the use command.
Daniel Feenberg
[email protected]
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/