Statalist


[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

Re: st: Weights


From   Alan Riley <[email protected]>
To   [email protected]
Subject   Re: st: Weights
Date   Wed, 30 Apr 2008 11:13:54 -0500

Martin Weiss has a dataset which started as a 2.4 GB csv file and has
been converted to a 5.5 GB Stata .dta file.  He has a 64-bit computer
with 4 GB of RAM, which isn't quite enough to read in this dataset as
a whole:
> if only I could open the file and compress it... I have the latest gear in
> terms of hard- and software (MP/2 10.0 64 bit, 4GB RAM, Vista Business 64
> bit, ...) but it is next to impossible to open the 5.5 GB file. Virtual mem
> makes things so slow it takes all the fun out of it... So I am stuck in a
> bit of a quandary.

He wishes he could read it in just once and use Stata's -compress- command
on it to store the variables more efficiently.  My guess is that all
of the variables are stored as -float- or -double- when many could
probably be stored as smaller types such as -byte- or -int-.

Austin Nichols made a couple of suggestions:
> Can you put a 8GB memory stick on the computer--can't Vista treat
> those as RAM?  How did you turn your 2.4 GB .csv file into a 5.5GB
> Stata file, anyway?  Can you specify a different variable type in that
> process, or save different sets of variables to different files (with
> an identifier for later merging)? 

Austin's suggestion about saving different sets of variables to
different files is exactly what I think Martin should do.

First, let me say that an 8 GB memory stick would not really help.
Although this is "memory", it is not the same kind of memory that
is used as RAM by a computer system.  These sticks are not much
faster than hard drives when it comes to transferring large amounts
of data, although they can 'find' files faster that are stored on
them.

If Martin has a dataset named 'master.dta' with 10 variables named
'a b c d e f g h i j', he could execute the following in Stata to
compress and recombine the entire file:

   . use a b c d e using master
   . compress
   . save part1
   . use d e f g h using master
   . compress
   . save part2
   . use part1
   . merge using part2
   . drop _merge
   . save newmaster

Martin might need to do this in 3 or 4 parts, but hopefully after
doing the above, he will be left with a new dataset which will
fit entirely in the RAM on his computer.


--Alan Riley
([email protected])
*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2024 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index