Statalist


[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

RE: st: File sizes in Stata & SPSS (was Weights )


From   "Paul Seed" <paul.seed@kcl.ac.uk>
To   <statalist@hsphsun2.harvard.edu>
Subject   RE: st: File sizes in Stata & SPSS (was Weights )
Date   Fri, 2 May 2008 15:03:30 +0100

Dear Statalist, 

Martin Weiss <martin.weiss@uni-tuebingen.de> has been asking for help in
handling an extremely long file that seems to gain size when converted from
CSV to Stata, but not to SPSS.  For reasons of confidentiality, he cannot
tell us want is in it; but some comments suggest that the problem might
relate to variable length strings.  For instance, there might be a comment
filed that is generally blank, but in a few cases contains a long &
extremely detailed response.  (A hymn of praise or a bitter complaint,
perhaps).

As Stata allocates each string variable a fixed length, there will be a lot
of unused space. As SPSS can store strings of variable length, it will make
use of this.

To check this out, I wrote a script that produces 3 files: example1 contains
a string of 30 characters that is always full; example2 contains a  similar
string that is blank except in the first record (similar to Martin's file as
imagined); example3 encodes the string in example2. After saving the files,
I copied them to SPSS using Stat/Transfer, and then  checked the file sizes.

In examples 1 & 3, Stata gives smaller files.  Only in example 2 does SPSS
"win".  

In this case, there is no loss of information due to encoding, as the
maximum length of the string is less than 244 characters. If Martin Weiss
has strings longer than this, and cares about the details contained beyond
character 244, he is perhaps involved in qualitative analysis for which
neither SPSS nor Stata are very useful. 



The code is below.

clear
set obs 30000
gen n = _n
gen string = "123456789012345678901234567890" 
compress 
memory
save example1, replace

replace string = "" if _n > 1
compress 
memory
save example2, replace

encode string, gen(string_)
drop string

compress
memory
save example3, replace

* Copy files to SPSS before continuing
pause on
pause

dir example*.*

Paul;

Paul Seed, Senior Lecturer in Medical Statistics
KCL School of Medicine, Division of Reproduction and Endocrinology
tel  (+44) (0) 20 7188 3642





*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index