Dear Statalist,
Martin Weiss <[email protected]> has been asking for help in
handling an extremely long file that seems to gain size when converted from
CSV to Stata, but not to SPSS. For reasons of confidentiality, he cannot
tell us want is in it; but some comments suggest that the problem might
relate to variable length strings. For instance, there might be a comment
filed that is generally blank, but in a few cases contains a long &
extremely detailed response. (A hymn of praise or a bitter complaint,
perhaps).
As Stata allocates each string variable a fixed length, there will be a lot
of unused space. As SPSS can store strings of variable length, it will make
use of this.
To check this out, I wrote a script that produces 3 files: example1 contains
a string of 30 characters that is always full; example2 contains a similar
string that is blank except in the first record (similar to Martin's file as
imagined); example3 encodes the string in example2. After saving the files,
I copied them to SPSS using Stat/Transfer, and then checked the file sizes.
In examples 1 & 3, Stata gives smaller files. Only in example 2 does SPSS
"win".
In this case, there is no loss of information due to encoding, as the
maximum length of the string is less than 244 characters. If Martin Weiss
has strings longer than this, and cares about the details contained beyond
character 244, he is perhaps involved in qualitative analysis for which
neither SPSS nor Stata are very useful.
The code is below.
clear
set obs 30000
gen n = _n
gen string = "123456789012345678901234567890"
compress
memory
save example1, replace
replace string = "" if _n > 1
compress
memory
save example2, replace
encode string, gen(string_)
drop string
compress
memory
save example3, replace
* Copy files to SPSS before continuing
pause on
pause
dir example*.*
Paul;
Paul Seed, Senior Lecturer in Medical Statistics
KCL School of Medicine, Division of Reproduction and Endocrinology
tel� (+44) (0) 20 7188 3642
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/