|From||Arnold Kester <firstname.lastname@example.org>|
|Subject||Re: st: Stata corrupting data?|
|Date||Fri, 13 Feb 2004 09:59:30 +0100|
Deborah Garvey wrote:
Hi, all.I had this last year (using stata 7) with a much smaller dataset, about 800k. I did a series of merge operations that required saving temporary datafiles on a server-disk over the network, and that sometimes resulted in corruption. I put several -assert- commands in my .do files to stop processing when corruption occurred.
I use Intercooled Stata 7.0 Win 98/95/NT version (born 6/11/02) on a
Dell OptiPlex GX110 with 512 MB RAM running Win 98. My computer has
about 7 GB disk space available for virtual memory. (Obviously not the
fastest machine on earth.)
A worrisome event occurred yesterday while reading in a 1990 Census 5%
PUMS CA sample abstracted from the IPUMS web site. The data set is 312m
in size with N = 1.46 m and 84 vars.
I set memory at 375 m. After I read in the data to STATA, ran
descriptive stats, and saved the data, I then did a quick cross-tab of 2
variables that should've yielded a 2x2 matrix. Instead, values were
changed for a couple of observations, and I ended up with a 3x3 table. I not so calmly exited STATA, restarted my computer and checked the
data. They seem to be fine.
The data were seriously corrupted when I initially attempted to read in
1990 and 2000 5% PUMS abstracts for CA simultaneously. I verified with
IPUMS that the problem was on my end, and not with their source data.
Has this happened to anyone else before? Is it just a matter of being
more generous in allocating virtual memory?