Stata 15 help for compress

[D] compress -- Compress data in memory


compress [varlist] [, nocoalesce]


Data > Data utilities > Optimize variable storage


compress attempts to reduce the amount of memory used by your data.


nocoalesce specifies that compress not try to find duplicate values within strL variables in an attempt to save memory. If nocoalesce is not specified, compress must sort the data by each strL variable, which can be time consuming in large datasets.


compress reduces the size of your dataset by considering two things. First, it considers demoting

doubles to longs, ints, or bytes floats to ints or bytes longs to ints or bytes ints to bytes str#s to shorter str#s strLs to str#s

See [D] data types for an explanation of these storage types.

Second, it considers coalescing strLs within each strL variable. That is to say, if a strL variable takes on the same value in multiple observations, compress can link those values to a single memory location to save memory. To check for this, compress must sort the data on each strL variable. You can use the nocoalesce option to tell compress not to take the time to perform this check. If compress does check whether it can coalesce strL values, it will do whichever saves more memory -- coalescing strL values or demoting a strL to a str# -- or it will do nothing if it cannot save memory by changing a strL.

compress leaves your data logically unchanged but (probably) appreciably smaller. compress never makes a mistake, results in loss of precision, or hacks off strings.


. webuse compxmp2 . compress

Video example

How to optimize the storage of variables

© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index