Statalist The Stata Listserver

[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

st: hardware + OS for large datasets

From   Jeph Herrin <>
To   statalist <>
Subject   st: hardware + OS for large datasets
Date   Fri, 20 Oct 2006 14:05:23 -0400

I searched the archive on this but didn't find anything
(or rather, found too much, because the key words are
rather generic). Apologies if it has come up recently,
and thanks for any links to previous discussions.

I am about to start two projects that involve very
large datasets; and I need to make some decisions about
whether the largest chunk I can handle in Stata will be
adequate. The budget for the hardware is very generous
and as a former Unix adminstrator I'd welcome a chance to
have a Linux box here again (though I suppose win64 is
also an option), so I'm thinking a new Linux box with
gobs of RAM.

However, other than the theoretical limit of the 64-bit
address space, I wonder what it is like in practice to
load and save (say) 20GB datasets using Stata/MP (or SE).
Does the Stata memory model (such a huge boon for smaller
datasets) have practical limitations? How about 64GB datasets?
I'm concerned about spending a fortune on RAM and then finding
it's not practical to work with.

This is particularly an issue because the investment will be
funded by a group that maintains the database in S#S, and
they would rather just buy me a S#S license; if I go wrong,
it won't be easy to go back for another kick. So I'd very
much like to hear about any experiences, good or bad, of
those working with very large datasets, and what their
insight into OS and number of processors (or cores) might

Jeph Herrin

*   For searches and help try:

© Copyright 1996–2017 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index