Statalist The Stata Listserver


[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

Re: st: Large dataset . Slow Stata


From   wgould@stata.com (William Gould, Stata)
To   statalist@hsphsun2.harvard.edu
Subject   Re: st: Large dataset . Slow Stata
Date   Thu, 10 May 2007 09:01:40 -0500

Tobias Pfaff <tobiaspfaff@gmx.net> writes, 

> I am using Intercooled Stata 9.2 on a laptop with an AMD 1.79 GHz processor
> and 512 MB ram. So far, Stata worked well with all datasets.
>
> Now, we are analyzing a larger dataset with 70 variables and 180,000
> observations. The dta-file has 224 MB. It takes alone two minutes to open
> the file, not to mention the processing time of simple operations like
> -drop- or -replace-. Is that normal? I have tried -compress-, which does not
> have any major impact on the file size.
>
> What is a PROFESSIONAL WAY to handle such a dataset?

I take it by "PROFESSIONAL" Tobias means spending more money.  That always
makes the solution easier.

Tobias has already gotten good advice.  Let's put aside spending more money
for a moment, however.  One piece of advice Tobias received, "I suggest that
at the begginning you set the memory at least at 500m and that is far enough
for any operation", Tobias does not want to do, given that his computer has
only 512 MB RAM.

What is happening is that Tobias's computer is "going virtual"; using 
virtual memory.  That works, but it is slow.  To achive good performance,
Tobias needs to -set memory- large enough to hold and work with the dataset,
but not so large as to exceed the physical amount of memory.  500MB is too
close to 512MB given that the computer needs to use memory for the operating
system and other processes.  Windows uses a lot of memory.  On his current
computer, I would recommend -set memory 384m- or, pushing it, maybe -set
memory 400m-.

That's the first, cheap answer.

The second, more expensive answer, is to get more memory and do -set memory-
to 500m or even larger.

If Tobias is going to work with these large datasets, he may want to consider
getting a faster computer, too.  His computer right now is a 1.79 GHz laptop.
Desktops are typically 3.2GHz these days, and some are even 4GHz.

Laptops are slower than desktops.  Laptops are on average around 2.4GHz.
Modern laptops, however, are available in dual-core configurations, and
Stata/MP can use that pretty efficiently.  

Load times are always going to be slower on a laptop than on a desktop.  Lap
tops tend to spint at 4200 RPM whereas desktops mostly use 7200 RPM drives.
Don't make too much of that, however, because I suspect the 2-minute loadtime
Tobias reported is more due to the use of virutual memory than the slower
laptop drive.  30 seconds sounds about right for a laptop.  Just remember,
desktops will usually load in half the time of laptops.

-- Bill
wgould@stata.com
*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index