David Kantor indirectly brought up the question how unused value labels can be removed from a dataset to reduce its size. Here is a solution with -labelsof- from SSC. sysuse auto encode make, gen(make2) drop if _n>5 labelsof make2 local labels "`r(values)'" foreach x of local labels { count if make2==`x' if r(N)==0 { lab def make2 `x' "", modify } } lab list make2 Friedrich On Fri, May 2, 2008 at 10:46 AM, David Kantor <kantor.d@att.net> wrote: > Hello all, > > I just want to add some observations about encoding. > > When you encode a string variable, the file contains a copy of every > distinct value. Consequently, it provides a space advantage usually only if > many of the values are repeated. If all or most observations are distinct, > then encoding will not gain a space advantage. (But you may have other > reasons for encoding.) > > But even when encoding is advantageous in terms of space, there is one > situation when it can backfire; I had not though of this until it happened > to me. I had a large file with a string variable with many distinct values > -- though many were often repeated. I encoded it, and gained a significant > space savings. > > Later, I created a multitude of smaller subsets of this file. Each one had > much fewer distinct values of the encoded variable. But each file retained > the full encoding table -- more than it needed. (Each file replicated the > encoding table.) The result was that each of the small files were much > bigger than they really needed to be. (And the total size may have been much > more then the original, even if there had been no overlap of observations.) > Subsequently, I decoded the variable, and the files shrunk significantly. > > I thought this is something to be aware of. > (It makes a potential case for having coding tables in a separate file. But > there are plenty of reasons not to have it that way.) > > --David * * For searches and help try: * http://www.stata.com/support/faqs/res/findit.html * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

