Stata
Products Purchase Support Company
Search
   >> Home >> Resources & support >> FAQs >> Large datasets under Windows

How do I load large datasets (>1 GB) under 32-bit Windows? I receive an error r(909) saying “op. sys. refuses to provide memory”.

Title   Large datasets under Windows
Author Kevin S. Turner, StataCorp
Date October 2001; updated July 2007; minor revisions October 2007

First, make sure you have installed enough memory or allowed for enough virtual memory. If you have and are still getting this error, continue reading.

Under all current 32-bit Windows operating systems (Windows Vista, XP, 2000, NT, ME, 98, 95), the total available address space for any application is 2.1 GB. If you have a dataset larger than 2.1 GB, you will not be able to load it on Stata for Windows. This is simply a limitation of the operating system.

Unfortunately, even if your dataset is under the 2.1-GB limit, you may run into difficulty when loading it into Stata. The fault again lies with how Windows manages the 2.1-GB address space. When a typical application loads, there are usually several libraries (or DLLs) that are loaded as well. These libraries are usually loaded into the 2.1-GB space on the upper end, but not in any deterministic order. Microsoft has assured us that there is no way to prevent these libraries from loading into arbitrary addresses; thus, fragmenting the available space. When Stata tries to load a dataset, it requests from Windows the largest contiguous space in the 2.1-GB range. Depending on where Windows loaded the initial libraries, this may be 1.8 GB, 1.3 GB, or even less. You may be surprised to find that a 1.4-GB dataset loaded fine one time but failed to load later. This is simply an unfortunate side effect of Windows memory management.

WINDOWS XP SP2 NOTE:   There is an issue in Windows XP, service pack 2, that fragments the memory available to Stata 10, as noted in the Microsoft Knowledge Base article at the below URL.

http://support.microsoft.com/?kbid=894472

If you have service pack 2, Stata 10, and often need memory near or above 1 GB, you should consider installing a hotfix patch from Microsoft that corrects the problem.

You can test to see if you are experiencing this in two ways. The first method is to use a version of Stata before version 10. Record the maximum amount of memory you can allocate and compare that to the maximum amount you can allocate under Stata 10. If there is a large difference (>50 MB), the issue is probably present. The second way to test is by using the System Restore functionality of Windows XP to revert to Service Pack 1. If you can allocate significantly more under Service Pack 1 than 2, you are most likely experiencing the problem.

A fix from Microsoft known as hotfix 894472 is available. Microsoft has informed us that this hotfix will become part of Windows XP service pack 3 (SP3).

NOTE: Vista does not need the hotfix.

Until SP3 is released, we have received permission from Microsoft to make hotfix 894472 available to affected users. It is available for download here. To install the hotfix, download it to your hard drive, double-click on it, and follow the instructions.

This issue can affect applications other than Stata 10 but Stata 10, because of its use of MFC (an internal set of Microsoft libraries) and need of contiguous memory space, is in a position to more readily exhibit this problem. Earlier versions of Stata did not use MFC, which is why they were not affected. This hotfix does not guarantee that your operating system will allocate close to the maximum memory limit of 2.1 GB.

NOTE: Windows 2003 server with service pack 1 has the same problem as above, but the bug was fixed in service pack 2.

Why Stata 10 memory allocation differs from Stata 9 on Windows XP and Windows 2003 server

Stata 10’s new Graph Editor uses the gdiplus.dll from Microsoft. Because of the base address Microsoft chooses for this dll, the memory space that is available to Stata 10 is fragmented. This causes the largest continuous memory block that you can allocate to Stata 10 to be about 200 MB less than Stata 9.

We have contacted Microsoft to see if they can fix this problem.

By now, you are wondering what your alternatives are. As of July 2007, several operating system alternatives with 64-bit support are becoming available. See www.stata.com/products/opsys.html for a list of operating systems compatible with Stata. The 64-bit platform will enable you to work with large datasets. Depending on your operating system, you should be able to allocate as much memory as you have on the machine, minus the system requirements. To take advantage of this technology, you will need 64-bit–compatible hardware, a 64-bit operating system, and of course a 64-bit version of Stata.

As a last resort, you may consider trimming any unnecessary data from your dataset or dividing the dataset into two files. You may want to use the second syntax of the use command to read in just the observations/variables you want. For example:

. describe using auto.dta

Contains data                                 1978 Automobile Data
  obs:            74                          26 Mar 2007 09:52
 vars:            12                          
 size:         3,478                          
-------------------------------------------------------------------------------
              storage  display     value
variable name   type   format      label      variable label
-------------------------------------------------------------------------------
make            str18  %-18s                  Make and Model
price           int    %8.0gc                 Price
mpg             int    %8.0g                  Mileage (mpg)
rep78           int    %8.0g                  Repair Record 1978
headroom        float  %6.1f                  Headroom (in.)
trunk           int    %8.0g                  Trunk space (cu. ft.)
weight          int    %8.0gc                 Weight (lbs.)
length          int    %8.0g                  Length (in.)
turn            int    %8.0g                  Turn Circle (ft.) 
displacement    int    %8.0g                  Displacement (cu. in.)
gear_ratio      float  %6.2f                  Gear Ratio
foreign         byte   %8.0g       origin     Car type
-------------------------------------------------------------------------------
Sorted by:  foreign  

. use mpg price for using auto.dta in 1/50, clear
(1978 Automobile Data)

. describe

Contains data from auto.dta
  obs:            50                          1978 Automobile Data
 vars:             3                          26 Mar 2007 09:52
 size:           450 (99.9% of memory free)
-------------------------------------------------------------------------------
              storage  display     value
variable name   type   format      label      variable label
-------------------------------------------------------------------------------
price           int    %8.0gc                 Price
mpg             int    %8.0g                  Mileage (mpg)
foreign         byte   %8.0g       origin     Car type
-------------------------------------------------------------------------------
Sorted by:  foreign

Depending on your data and analysis, this may not be feasible and is offered only as a suggestion.

FAQs
What's new?
Statistics
Data management
Graphics
Programming Stata
Mata
Resources
Internet capabilities
Stata for Windows
Stata for Unix
Stata for Macintosh
Technical support
Resources & support
FAQs
Technical support
NetCourses
Short courses
Users Group meetings
Statalist
Links
Software updates
Software archives
Customer service
Manuals & supplements
Stata Journal
STB
Stata News
Stata Automation
Plugins

Site overview
Products
Resources & support
Company
Site index

© Copyright 1996–2008 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   Site index