Bookmark and Share

Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down at the end of May, and its replacement, statalist.org is already up and running.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

st: -use- from a compressed file


From   Daniel Feenberg <feenberg@nber.org>
To   statalist@hsphsun2.harvard.edu
Subject   st: -use- from a compressed file
Date   Thu, 18 Aug 2011 16:06:28 -0400 (EDT)

The Stata knowledge base includes a note on reading ASCII data from a pipe, which would allow one to read a file without storing the decompressed version on disk. We have never had success with the method shown there - I always get the error message "mypipe.pip: not found". We have terabytes of data that compresses very well, so this was always a disappointment. We'd be interested in hearing if it works for anyone else.

While investigating this we found a work-around that seems much better. Unlike the knowledge base suggestion, it will work with .dta files in addition to ASCII files. This is very much more interesting to us. This is done with the http option of the -use- command.

Our first try was to add the file test.cgi to our webservers cgi-bin directory:

   #!/bin/sh
   echo Content-type: application/x-stata
   /usr/bin/zcat /data/sample.dta.gz

and we find that

  use http://www.nber.org/test

works from Stata but this involved a lot of overhead as the file whipped around the LAN several times, so we haven't pursued taking the file name from the URL or otherwise making this practical.

We are developing an alternative that doesn't require an actual webserver, or even root permissions. This is done with the nc command which ships with most Linux distributions and is available for windows also. At the Stata prompt run the compound command:

.! (echo -ne "HTTP/1.0 200 OK\r\n\r\n"; zcat /data/sample.dta.gz;) | nc -l 8080 &

This command sets up the computer to transmit a header and the decompressed file to the first process that reads from port 8080. Since 8080 is a high port, no special permission is required to use it. This command won't return till the file is read from that port, when it will show you the exact Stata request. Because of the & Stata continues while nc waits. Then

. use http://127.0.0.1:8080

Note that you can't use "localhost" instead of 127.0.0.1 because the -use- command won't accept one-part host names.

If there is no nc on your machine, look for ncat, netcat or socat. Some versions will require a '-p' before the port number. You can install nc on a Windows machine and should be able to do the same thing, but we haven't tried it.

This could also be used for ascii files, encrypted files, split files, and perhaps other types. If only Stat/Transfer would write to the standard output!

There is a security issue - you give up the read restrictions in the Unix permission bits. It is also slower than reading the uncompressed file from disk, but still fast enough for us.

We have been trying to package this into an ado file, but without much success, since a user-friendly ado program would need to find an available port by itself, which we haven't seen a good way to do yet, and to communicate it back to the use command, for which we are also at a loss. I was hoping someone on the list might be inspired to suggest a method or that Statacorp might just incorporate decompression into the use command.

Daniel Feenberg
feenberg@nber.org
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   Site index