[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

From |
"Martin Weiss" <martin.weiss@uni-tuebingen.de> |

To |
<statalist@hsphsun2.harvard.edu> |

Subject |
RE: st: Weights |

Date |
Wed, 30 Apr 2008 20:51:59 +0200 |

Alan, thanks for the reply. Someone asked how the dataset was constructed; I read into SPSS to have the data labeled there and then had a colleague of mine use Stat/Transfer (which claims to optimize before handing out the file) to go to Stata. As for partitioning the file and compressing, I should have mentioned that I had Stata do that overnight. What happened was that my -forvalues- loop cut the file every 100,000th row, compressed it and saved it to hard disk. This resulted in 29 files of 205 MB and one smaller one as the dataset has 2,911,000 odd rows. -Compress- changed the type for roughly half of the vars, yet the filesize decreased by only 15% on average, with very little variation in the decrease. I appended the resulting files and, as you can imagine, the resulting file did not fit into my 4 GB (!) mem, either. So the difference to the SPSS file cannot be explained away by unwieldy organization of data. Nor do I think that cutting the file into pieces along its columns would impact this result. I find appending easier to grasp than merging, but the result should be equivalent. I am beginning to wonder why Stata insists on holding the entire dataset in mem. A USB-stick does not change the equation as it is almost as slow to respond to requests as the hard drive itself... Martin Weiss _________________________________________________________________ Diplom-Kaufmann Martin Weiss Mohlstrasse 36 Room 415 72074 Tuebingen Germany Fon: 0049-7071-2978184 Home: http://www.wiwi.uni-tuebingen.de/cms/index.php?id=1130 Publications: http://www.wiwi.uni-tuebingen.de/cms/index.php?id=1131 SSRN: http://papers.ssrn.com/sol3/cf_dev/AbsByAuth.cfm?per_id=669945 -----Original Message----- From: owner-statalist@hsphsun2.harvard.edu [mailto:owner-statalist@hsphsun2.harvard.edu] On Behalf Of Alan Riley Sent: Wednesday, April 30, 2008 6:22 PM To: statalist@hsphsun2.harvard.edu Subject: Re: st: Weights A slight correction to my previous post (included in its entirety below my siganture): The second -use- command I showed should be . use f g h i j using master rather than . use d e f g h using master --Alan Riley (ariley@stata.com) Alan Riley wrote: > Martin Weiss has a dataset which started as a 2.4 GB csv file and has > been converted to a 5.5 GB Stata .dta file. He has a 64-bit computer > with 4 GB of RAM, which isn't quite enough to read in this dataset as > a whole: > > if only I could open the file and compress it... I have the latest gear in > > terms of hard- and software (MP/2 10.0 64 bit, 4GB RAM, Vista Business 64 > > bit, ...) but it is next to impossible to open the 5.5 GB file. Virtual mem > > makes things so slow it takes all the fun out of it... So I am stuck in a > > bit of a quandary. > > He wishes he could read it in just once and use Stata's -compress- command > on it to store the variables more efficiently. My guess is that all > of the variables are stored as -float- or -double- when many could > probably be stored as smaller types such as -byte- or -int-. > > Austin Nichols made a couple of suggestions: > > Can you put a 8GB memory stick on the computer--can't Vista treat > > those as RAM? How did you turn your 2.4 GB .csv file into a 5.5GB > > Stata file, anyway? Can you specify a different variable type in that > > process, or save different sets of variables to different files (with > > an identifier for later merging)? > > Austin's suggestion about saving different sets of variables to > different files is exactly what I think Martin should do. > > First, let me say that an 8 GB memory stick would not really help. > Although this is "memory", it is not the same kind of memory that > is used as RAM by a computer system. These sticks are not much > faster than hard drives when it comes to transferring large amounts > of data, although they can 'find' files faster that are stored on > them. > > If Martin has a dataset named 'master.dta' with 10 variables named > 'a b c d e f g h i j', he could execute the following in Stata to > compress and recombine the entire file: > > . use a b c d e using master > . compress > . save part1 > . use d e f g h using master > . compress > . save part2 > . use part1 > . merge using part2 > . drop _merge > . save newmaster > > Martin might need to do this in 3 or 4 parts, but hopefully after > doing the above, he will be left with a new dataset which will > fit entirely in the RAM on his computer. > > > --Alan Riley > (ariley@stata.com) * * For searches and help try: * http://www.stata.com/support/faqs/res/findit.html * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/ * * For searches and help try: * http://www.stata.com/support/faqs/res/findit.html * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

**References**:**Re: st: Weights***From:*"Austin Nichols" <austinnichols@gmail.com>

**Re: st: Weights***From:*Alan Riley <ariley@stata.com>

**Re: st: Weights***From:*Alan Riley <ariley@stata.com>

- Prev by Date:
**Re: st: RE: questions on using the list** - Next by Date:
**st: RE: FW: RE: RE: RE: RE: bar graph problem,** - Previous by thread:
**Re: st: Weights** - Next by thread:
**Re: st: Weights** - Index(es):

© Copyright 1996–2017 StataCorp LLC | Terms of use | Privacy | Contact us | What's new | Site index |