[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

Re: st: Weights

From   Alan Riley <>
Subject   Re: st: Weights
Date   Wed, 30 Apr 2008 11:21:41 -0500

A slight correction to my previous post (included in its entirety
below my siganture):  The second -use- command I showed should be

   . use f g h i j using master

rather than

   . use d e f g h using master

--Alan Riley

Alan Riley wrote:
> Martin Weiss has a dataset which started as a 2.4 GB csv file and has
> been converted to a 5.5 GB Stata .dta file.  He has a 64-bit computer
> with 4 GB of RAM, which isn't quite enough to read in this dataset as
> a whole:
> > if only I could open the file and compress it... I have the latest gear in
> > terms of hard- and software (MP/2 10.0 64 bit, 4GB RAM, Vista Business 64
> > bit, ...) but it is next to impossible to open the 5.5 GB file. Virtual mem
> > makes things so slow it takes all the fun out of it... So I am stuck in a
> > bit of a quandary.
> He wishes he could read it in just once and use Stata's -compress- command
> on it to store the variables more efficiently.  My guess is that all
> of the variables are stored as -float- or -double- when many could
> probably be stored as smaller types such as -byte- or -int-.
> Austin Nichols made a couple of suggestions:
> > Can you put a 8GB memory stick on the computer--can't Vista treat
> > those as RAM?  How did you turn your 2.4 GB .csv file into a 5.5GB
> > Stata file, anyway?  Can you specify a different variable type in that
> > process, or save different sets of variables to different files (with
> > an identifier for later merging)? 
> Austin's suggestion about saving different sets of variables to
> different files is exactly what I think Martin should do.
> First, let me say that an 8 GB memory stick would not really help.
> Although this is "memory", it is not the same kind of memory that
> is used as RAM by a computer system.  These sticks are not much
> faster than hard drives when it comes to transferring large amounts
> of data, although they can 'find' files faster that are stored on
> them.
> If Martin has a dataset named 'master.dta' with 10 variables named
> 'a b c d e f g h i j', he could execute the following in Stata to
> compress and recombine the entire file:
>    . use a b c d e using master
>    . compress
>    . save part1
>    . use d e f g h using master
>    . compress
>    . save part2
>    . use part1
>    . merge using part2
>    . drop _merge
>    . save newmaster
> Martin might need to do this in 3 or 4 parts, but hopefully after
> doing the above, he will be left with a new dataset which will
> fit entirely in the RAM on his computer.
> --Alan Riley
> (
*   For searches and help try:

© Copyright 1996–2017 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index