Statalist The Stata Listserver


[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

Re: st: Large dataset . Slow Stata


From   "Tobias Pfaff" <[email protected]>
To   <[email protected]>
Subject   Re: st: Large dataset . Slow Stata
Date   Fri, 11 May 2007 10:04:58 +0200

Thanks for all your answers!
I will try out your hints and suggestions and I will then post what worked
best.

Cheers,
Tobias




Tobias Pfaff <[email protected]> writes, 

> I am using Intercooled Stata 9.2 on a laptop with an AMD 1.79 GHz
processor
> and 512 MB ram. So far, Stata worked well with all datasets.
>
> Now, we are analyzing a larger dataset with 70 variables and 180,000
> observations. The dta-file has 224 MB. It takes alone two minutes to open
> the file, not to mention the processing time of simple operations like
> -drop- or -replace-. Is that normal? I have tried -compress-, which does
not
> have any major impact on the file size.
>
> What is a PROFESSIONAL WAY to handle such a dataset?

I take it by "PROFESSIONAL" Tobias means spending more money.  That always
makes the solution easier.

Tobias has already gotten good advice.  Let's put aside spending more money
for a moment, however.  One piece of advice Tobias received, "I suggest that
at the begginning you set the memory at least at 500m and that is far enough
for any operation", Tobias does not want to do, given that his computer has
only 512 MB RAM.

What is happening is that Tobias's computer is "going virtual"; using 
virtual memory.  That works, but it is slow.  To achive good performance,
Tobias needs to -set memory- large enough to hold and work with the dataset,
but not so large as to exceed the physical amount of memory.  500MB is too
close to 512MB given that the computer needs to use memory for the operating
system and other processes.  Windows uses a lot of memory.  On his current
computer, I would recommend -set memory 384m- or, pushing it, maybe -set
memory 400m-.

That's the first, cheap answer.

The second, more expensive answer, is to get more memory and do -set memory-
to 500m or even larger.

If Tobias is going to work with these large datasets, he may want to
consider
getting a faster computer, too.  His computer right now is a 1.79 GHz
laptop.
Desktops are typically 3.2GHz these days, and some are even 4GHz.

Laptops are slower than desktops.  Laptops are on average around 2.4GHz.
Modern laptops, however, are available in dual-core configurations, and
Stata/MP can use that pretty efficiently.  

Load times are always going to be slower on a laptop than on a desktop.  Lap
tops tend to spint at 4200 RPM whereas desktops mostly use 7200 RPM drives.
Don't make too much of that, however, because I suspect the 2-minute
loadtime
Tobias reported is more due to the use of virutual memory than the slower
laptop drive.  30 seconds sounds about right for a laptop.  Just remember,
desktops will usually load in half the time of laptops.

-- Bill
[email protected]
*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2024 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index