Bookmark and Share

Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down on April 23, and its replacement, statalist.org is already up and running.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: .dta storage, why is too big?


From   Daniel Feenberg <feenberg@nber.org>
To   statalist@hsphsun2.harvard.edu
Subject   Re: st: .dta storage, why is too big?
Date   Tue, 7 Jun 2011 14:08:45 -0400 (EDT)


On Tue, 7 Jun 2011, Daniel Marcelino wrote:

Hello for all,

today I came across my old and new files size, R and Stata storage
respectively. This got me thinking about why Stata compression is too
inefficient compared to R? Even thought I use variable attributes like
labels R compression is incredible. For example, 530 mb of Stata file
turns into 9 mb R file and about 330 mb as txt file.  So, my point is:
do you know any trick to compress Stata files addition to command line
"compress".

The Stata -compress- command does not do any sort of Shannonesque compression. Rather it coverts each variable to the smallest type that will hold it without conversion error. So a float that had only small positive integers would be converted to a byte, but if a variable were double precision, but had few possible values (e.g. the CPI in a short panel) it would stay a double and continue to take up 8 bytes per observation, no matter how many observations. Similarly, a dummy variable that was zero in all observations but 1 would still take up a byte per observation.

If you are running under Unix, you might use one of the Unix compress commands on the dta file, and use the method described here:

  http://www.stata.com/support/faqs/unix/pipe.html

to read or write such files, achive much better compression, and possibly higher speed.

Daniel Feenberg


Best,
Daniel
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   Site index