Advice on 2+ billion observations

Stata/MP allows more than 2 billion observations. How many observations depends solely on the amount of memory on your computer. Stata will not limit you; it can count up to 1 trillion observations.

We have advice on using this feature. Setting min_memory and segmentsize will dramatically improve performance with large numbers of observations.

First, let's address how many observations you will likely be able to process:

Billions of observations Computer's Memory scenario memory used (1) (2) (3) ------------------------------------------- 128GB 112GB 1.8 1.4 1.0 256GB 240GB 3.8 2.9 2.1 512GB 496GB 7.9 6.1 4.4 1024GB 1008GB 16.2 12.3 9.8 1536GB 1520GB 24.4 18.5 13.6 ------------------------------------------- Notes: Memory used is total used for storing data. We left 16GB free for Stata and other processes. We assume that Stata consumes nearly all the computer's resources (single user).

Observations leaves extra room for adding three doubles because Stata commands often add working variables. The width used by the three scenarios is for your data exclusive of working variables.

Scenario 1: width = 43 bytes (same as auto.dta) Scenario 2: width = 64 bytes Scenario 3: width = 96 bytes


memory_used 1024³ obs = ------------ × ------ width + 24 1000³

where memory_used is in gigabytes and obs is in billions.

Stata will run faster with large numbers of observations if you change two memory settings, segmentsize and min_memory. Set segmentsize to 2g (the default is 32m),

. set segmentsize 2g

Set min_memory to the amount of memory you want Stata to use, which should be Memory used for your size of computer in the table above or a smaller value:

. set min_memory 240g /* or smaller value on a 256g computer */

. set min_memory 496g /* or smaller value on a 512g computer */

. set min_memory 1008g /* or smaller value on a 1TB computer */

. set min_memory 1520g /* or smaller value on a 1.5TB computer */

If you use a multiuser computer, be aware that setting min_memory causes Stata to allocate and reserve the memory for you and thus harms other users.

When you are done using large numbers of observations, return the values to their defaults (or just exit Stata).

. set min_memory 0

. set segmentsize 32m

