Home  /  Resources & support  /  FAQs  /  Large datasets under Windows

How do I load large datasets (>1 GB) under 32-bit Windows? I receive an error r(909) saying “op. sys. refuses to provide memory”.

Title   Large datasets under Windows
Author Kevin S. Turner, StataCorp

First, make sure you have installed enough memory or allowed for enough virtual memory. If you have and are still getting this error, continue reading.

Under all current 32-bit Windows operating systems (Windows 8, 7, Vista, XP, 2000, NT, ME, 98, 95), the total available address space for any application is 2.1 GB. If you have a dataset larger than 2.1 GB, you will not be able to load it on Stata for Windows. This is simply a limitation of the operating system.

Unfortunately, even if your dataset is under the 2.1-GB limit, you may run into difficulty when loading it into Stata. The fault again lies with how Windows manages the 2.1-GB address space. When a typical application loads, there are usually several libraries (or DLLs) that are loaded as well. These libraries are usually loaded into the 2.1-GB space on the upper end but not in any deterministic order. Microsoft has assured us that there is no way to prevent these libraries from loading into arbitrary addresses; thus, fragmenting the available space. When Stata tries to load a dataset, it requests from Windows the largest contiguous space in the 2.1-GB range. Depending on where Windows loaded the initial libraries, this may be 1.8 GB, 1.3 GB, or even less. You may be surprised to find that a 1.4-GB dataset loaded fine one time but failed to load later. This is simply an unfortunate side effect of Windows memory management.

As of Stata 11.1, some of the dependencies on external DLLs were removed, reducing memory fragmentation and increasing the amount of memory available to Stata. If you are using 32-bit Windows XP and you are still having trouble allocating memory, you should read “Memory allocation in Windows XP”.

By now, you are wondering what your alternatives are. Since July 2007, several operating system alternatives with 64-bit support have become available. See our list of operating systems compatible with Stata. The 64-bit platform will enable you to work with large datasets. Depending on your operating system, you should be able to allocate as much memory as you have on the machine, minus the system requirements. To take advantage of this technology, you will need 64-bit–compatible hardware, a 64-bit operating system, and, of course, a 64-bit version of Stata.

As a last resort, you may consider trimming any unnecessary data from your dataset or dividing the dataset into two files. You may want to use the second syntax of the use command to read in just the observations/variables you want. For example:

. describe using auto.dta

Contains data                                 1978 Automobile Data
  obs:            74                          26 Mar 2007 09:52
 vars:            12                          
 size:         3,478                          
-------------------------------------------------------------------------------
              storage  display     value
variable name   type   format      label      variable label
-------------------------------------------------------------------------------
make            str18  %-18s                  Make and Model
price           int    %8.0gc                 Price
mpg             int    %8.0g                  Mileage (mpg)
rep78           int    %8.0g                  Repair Record 1978
headroom        float  %6.1f                  Headroom (in.)
trunk           int    %8.0g                  Trunk space (cu. ft.)
weight          int    %8.0gc                 Weight (lbs.)
length          int    %8.0g                  Length (in.)
turn            int    %8.0g                  Turn Circle (ft.) 
displacement    int    %8.0g                  Displacement (cu. in.)
gear_ratio      float  %6.2f                  Gear Ratio
foreign         byte   %8.0g       origin     Car type
-------------------------------------------------------------------------------
Sorted by:  foreign  

. use mpg price for using auto.dta in 1/50, clear
(1978 Automobile Data)

. describe

Contains data from auto.dta
  obs:            50                          1978 Automobile Data
 vars:             3                          24 June 2013 15:56
 size:           250                          (_dta has notes)
-------------------------------------------------------------------------------
              storage  display     value
variable name   type   format      label      variable label
-------------------------------------------------------------------------------
price           int    %8.0gc                 Price
mpg             int    %8.0g                  Mileage (mpg)
foreign         byte   %8.0g       origin     Car type
-------------------------------------------------------------------------------
Sorted by:  foreign

Depending on your data and analysis, this may not be feasible and is offered only as a suggestion.