Bookmark and Share

Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down on April 23, and its replacement, is already up and running.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

st: Store datafile at minimum possible file size

From   <>
To   <>
Subject   st: Store datafile at minimum possible file size
Date   Fri, 16 Apr 2010 14:49:03 +0200

-zipfile- has already been mentioned. 

Inside Stata you can use -encode- to change a string var to numeric with value labels.
In case you have a lot of string repetitions in the data this can shrink the file size to a small fraction.
With -decode- you can always go back. 


You can even output the encoded file to ASCII and restore the value labels in other software by a script or a dictionary file if the small filesize is worth the extra effort. 
A few times I used Stata to create such a dictionary or script (e.g. in SQL). 

In case that all commands have the same structure (often with SQL -update- or -insert- scripts), 
you can use Stata's data window to "write" it. Some hints how to do this: 

You must do this separately for every var you want to process in this way:

First -levelsof- hands the levels to a local. Do a -foreach- loop over this local.
Extended macro function -label- stores the value labels created by -encode- in locals. 
The local names should contain the level number (like "loc123") so you can refer to it later. 

Now you can use -duplicates- with option "drop" to keep unique levels of this var. 
Delete all other vars and write commands as constant string vars. 
Loop over levels to insert the fitting local values (value label strings) to the numeric values.
Use -order- to put all parts of the commands into the right place. 

Copy and paste the data editor to a text editor and you have a script. 


*   For searches and help try:

© Copyright 1996–2016 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   Site index