Statalist The Stata Listserver

[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

Re: Wishlist st: Re: Large data sets

From   "Ben Jann" <>
Subject   Re: Wishlist st: Re: Large data sets
Date   Sat, 23 Jun 2007 10:30:34 +0200


. set virtual on

(Or do I misunderstand your query?)

On 6/23/07, SamL <> wrote:
This brings up a general problem that I wonder if stata can fix, or has
fixed.  I routinely use large datasets--by large I mean 4-10 gb.  I use a
unix system managed by a computer center.  Sometimes my data becomes so
large that there is no machine large enough to invoke stata and hold all
the data in memory.

I should also say that my models routinely take weeks or even months to
run, so I am resigned to waiting a long time for results.  Speed is not my

So, I was wondering whether stata can, or has, made it possible to tell
stata to use a disk as virtual memory.  I know this will slow down run
times substantially, and my 4 week job could easily become a 6 month job.
But, at present, my 4 week job cannot be run without more resources, and
that means waiting a year to get more money to buy more memory.  So, even
at 6 months, that would be more than twice as fast as now.

I know this is not the common desire--I chuckle when someone complains
that it took 57 seconds to run something rather than the 35 seconds they
had expected, and wonder whether stata is efficient enough.  Yes,
efficiency there would help me, too.  But what would help far more is the
ability to just opt out of holding all the data in memory when I need to
do so.

FYI--Yes, I've done all the data reduction one might suggest (e.g., using
frequency weights for patterns in the data, and so forth) and no, for my
problem sampling will not be helpful because the models I am estimating
need all the data to address issues of sparseness in some parts of the
data, and sampling will prevent that gain in information.

So, how 'bout it stata--have you made it possible for one to use a disk as
virtual memory with a few commands and, if not, will you please do so?

Respectfully yours,

On Fri, 22 Jun 2007, Michael Blasnik wrote:

> ....
> I don't know about a Spanish language version of the patch, but even without the
> patch you should be able to allocate about 900m to Stata under XP.  How much
> physical memory do you have?  Do you have lots of other programs running at the
> same time?  You may want to reduce the number of programs that start up
> automatically and launch Stata with nothing else running to see if you can get
> to something close to 900m.
> There's still a good chance that you will actually need more than 900m if you
> really need to use a 700m+ dataset.  But you may be able to make the dataset
> smaller:  Can you drop some variables? Could you compress some variables using
> value labels or smaller storage types or even  using abbreviations in text
> strings?  Can you drop some observations for some analyses?  All of these
> options are worth exploring.  Otherwise, you could switch to an OS that can
> allocate more memory than XP -- Mac, Linux, even Windows Vista can allocate
> more.
> Michael Blasnik
> ----- Original Message -----
> From: "Carmen Ponce" <>
> To: <>
> Sent: Friday, June 22, 2007 5:56 PM
> Subject: st: Large data sets
> > Hi,
> > I need to work with large datasets (>700M, <2g), but Stata 9.2 does not
> > allow me to set memory above 500M (my computer should allow for up to 2g
> > memory setting).
> > I checked on Stata´s FAQs website
> > ( and found out about
> > the hotfix patch 894472 that helps overcome this problem. I tried to execute
> > the hotfix patch from this website but got this message "Setup cannot update
> > Windows XP files because the language installed on your system is different
> > from the update language".
> > I tried to find a version in spanish (my system is set in spanish) but have
> > not succeded..
> > Does anyone know about a spanish version of this patch?
> >
> > Thank you in advance,
> > Carmen
> *
> *   For searches and help try:
> *
> *
> *

*   For searches and help try:

*   For searches and help try:

© Copyright 1996–2017 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index