Stata The Stata listserver
[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

st: New package for easy handling of compressed datasets

From   Henrik Stovring <[email protected]>
To   statalist <[email protected]>
Subject   st: New package for easy handling of compressed datasets
Date   Mon, 04 Oct 2004 10:57:18 +0200

Dear statalist,

Thanks to Kit Baum, a new package called 'gzsave' can now be downloaded from the SSC archives. To install just type:

ssc install gzsave

The package is aimed at Unix/Linux/MacOSX-users (Kit made me aware that gzip is installed by default on MacOSX, thanks!), and is described as follows in its abstract:

Use these routines to save and use datasets compressed by gzip on Unix/Linux/MacOSX. They share syntax with Stata's save and use commands.

And from the remarks in its helpfile:

These commands are useful for two purposes:

First, they obviously help lowering the space used on disk by a dataset, which may be important when storing very large datasets.

Second, they may help reduce network load when using a distributed disk system such as NFS. This is due to the fact that the commands only transfer the compressed datasets over the network, since the uncompressed dataset is only stored as a temporay datafile, which typically resides on the local disk (where local is relative to the running instance of Stata).

The price paid for saving disk space (and network load) is the CPU time used by gzip - please, test for yourself whether compression is actually advantageous in your specific set-up.

Please let me know if you find bugs or have suggestions for improvements.



Henrik St�vring, PhD

Research Unit of General Practice
University of Southern Denmark
Winsl�wparken 19, 3
DK-5000 Odense C
Phone: (+45) 6550 3692
Fax: (+45) 6591 8296
email: [email protected]
* For searches and help try:

© Copyright 1996–2024 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index