Bookmark and Share

Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down on April 23, and its replacement, is already up and running.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: .dta storage, why is too big?

From   Daniel Feenberg <>
Subject   Re: st: .dta storage, why is too big?
Date   Tue, 7 Jun 2011 14:08:45 -0400 (EDT)

On Tue, 7 Jun 2011, Daniel Marcelino wrote:

Hello for all,

today I came across my old and new files size, R and Stata storage
respectively. This got me thinking about why Stata compression is too
inefficient compared to R? Even thought I use variable attributes like
labels R compression is incredible. For example, 530 mb of Stata file
turns into 9 mb R file and about 330 mb as txt file.  So, my point is:
do you know any trick to compress Stata files addition to command line

The Stata -compress- command does not do any sort of Shannonesque compression. Rather it coverts each variable to the smallest type that will hold it without conversion error. So a float that had only small positive integers would be converted to a byte, but if a variable were double precision, but had few possible values (e.g. the CPI in a short panel) it would stay a double and continue to take up 8 bytes per observation, no matter how many observations. Similarly, a dummy variable that was zero in all observations but 1 would still take up a byte per observation.

If you are running under Unix, you might use one of the Unix compress commands on the dta file, and use the method described here:

to read or write such files, achive much better compression, and possibly higher speed.

Daniel Feenberg

*   For searches and help try:

*   For searches and help try:

© Copyright 1996–2015 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   Site index