Bookmark and Share

Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down at the end of May, and its replacement, statalist.org is already up and running.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: st: Store datafile at minimum possible file size


From   "Martin Weiss" <martin.weiss1@gmx.de>
To   <statalist@hsphsun2.harvard.edu>
Subject   RE: st: Store datafile at minimum possible file size
Date   Fri, 16 Apr 2010 19:17:58 +0200

<>

Henrik,

how does your package compare to the now official -zipfile- command?


HTH
Martin


-----Original Message-----
From: owner-statalist@hsphsun2.harvard.edu [mailto:owner-statalist@hsphsun2.harvard.edu] On Behalf Of Henrik Stovring
Sent: Freitag, 16. April 2010 19:15
To: statalist@hsphsun2.harvard.edu
Subject: Re: st: Store datafile at minimum possible file size

Please excuse me for advertising packages written by myself, but you may
find the -zipsave-package useful, as it includes a -zipuse- and a
-zipmerge- command that make the zip-files more readily accessible.

Best,

Henrik

Michael Boehm wrote:
> Thanks again, both of these suggestions sound like I could make
> profitable use of them :)
> 
> Michael
> 
> On Fri, Apr 16, 2010 at 2:55 PM, Pavlos C. Symeou <p.symeou@lmu.de> wrote:
>> Well, from my experience, I just had to try this to surprise myself. I had
>> an enormous dataset 14.5G consisting of 600 string variables and more than
>> 35000 observations. Exporting the dataset to tab-separated format resulted
>> in a file of about 800M. Compressing it to the Zip format resulted in a file
>> a bit less than 18M. That is an amazing difference. However, the problem
>> always remains, at least in my case, when the time for analysis comes. I
>> will still have to convert the compressed file back to the .dta format, and
>> then get back to the 14.5G. At least I can save all my files on a single
>> memory stick:)
>>
>> Cheers,
>>
>> Pavlos
>>
>> On 16/04/2010 14:49, Stefan.Gawrich@hlpug.hessen.de wrote:
>>> -zipfile- has already been mentioned.
>>>
>>> Inside Stata you can use -encode- to change a string var to numeric with
>>> value labels.
>>> In case you have a lot of string repetitions in the data this can shrink
>>> the file size to a small fraction.
>>> With -decode- you can always go back.
>>>
>>> ***
>>>
>>> You can even output the encoded file to ASCII and restore the value labels
>>> in other software by a script or a dictionary file if the small filesize is
>>> worth the extra effort.
>>> A few times I used Stata to create such a dictionary or script (e.g. in
>>> SQL).
>>>
>>>
>>> In case that all commands have the same structure (often with SQL -update-
>>> or -insert- scripts),
>>> you can use Stata's data window to "write" it. Some hints how to do this:
>>>
>>> You must do this separately for every var you want to process in this way:
>>>
>>> First -levelsof- hands the levels to a local. Do a -foreach- loop over
>>> this local.
>>> Extended macro function -label- stores the value labels created by
>>> -encode- in locals.
>>> The local names should contain the level number (like "loc123") so you can
>>> refer to it later.
>>>
>>> Now you can use -duplicates- with option "drop" to keep unique levels of
>>> this var.
>>> Delete all other vars and write commands as constant string vars.
>>> Loop over levels to insert the fitting local values (value label strings)
>>> to the numeric values.
>>> Use -order- to put all parts of the commands into the right place.
>>>
>>> Copy and paste the data editor to a text editor and you have a script.
>>>
>>> Stefan
>>>
>>>
>>> *
>>> *   For searches and help try:
>>> *http://www.stata.com/help.cgi?search
>>> *http://www.stata.com/support/statalist/faq
>>> *http://www.ats.ucla.edu/stat/stata/
>>>
>> *
>> *   For searches and help try:
>> *   http://www.stata.com/help.cgi?search
>> *   http://www.stata.com/support/statalist/faq
>> *   http://www.ats.ucla.edu/stat/stata/
>>
> 
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/statalist/faq
> *   http://www.ats.ucla.edu/stat/stata/
> 

-- 
Henrik Støvring			Department of Biostatistics
Associate professor            	University of Aarhus
stovring@biostat.au.dk     	Bartholins Allé 2, Bldg 1261, 217
Phone +45 8942 6131            	8000 Aarhus
Fax +45 8942 6140              	Denmark
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   Site index