__Using Stata/MP__

There are three flavors of Stata:

Flavor Description
-----------------------------------------------
**Stata/IC** standard version
**Stata/SE** Stata/IC + large datasets
-> **Stata/MP** Stata/SE + parallel processing
-----------------------------------------------
See **[U] 5 Flavors of Stata** for descriptions

To determine which flavor of Stata you are running, type

**. about**

If you are using a different flavor of Stata, click on the appropriate
link:

-----------------------------------------------
**Stata/IC** Using Stata/IC
**Stata/SE** Using Stata/SE
-----------------------------------------------

For information on upgrading to Stata/MP, point your browser to
http://www.stata.com.

__Contents__

1. Starting Stata/MP

2. Setting Stata/MP's limits
2.1 Advice on setting processors
2.2 Advice on setting maxvar
2.3 Advice on setting matsize

3. Sharing .dta datasets with non-MP users

4. Querying memory usage

5. Advice to programmers
5.1 Determining flavor
5.2 Avoid macro shift in program loops

__1. Starting Stata/MP__

You start Stata/MP in much the same way as you start Stata/IC or
Stata/SE:

Windows:
Select **Start > All Programs > Stata 15.1 > StataMP 15.1**

Mac:
Double-click the file **Stata.do** from the **data** folder, or
double-click the **StataMP** icon from the **Stata** folder.

Unix:
At the Unix command prompt, type **xstata-mp** to invoke the GUI
version of Stata/MP, or type **stata-mp** to invoke the console
version.

__2. Setting Stata/MP's limits__

The three limits for Stata/MP are as follows:

1. **processors**
The maximum number of processors or cores to be used. This
limit is initially set to (1) the number of cores on your
computer or (2) the number of cores allowed by your
license, depending on which is less. You reset the limit
if you want to use fewer processors than that, say because
you want to leave processors free for some other, non-Stata
task.

2. **maxvar**
The maximum number of variables allowed in a dataset. This
limit is initially set to 5,000; you can increase it up to
120,000.

3. **matsize**
The maximum size of matrices, or said differently, the
maximum number of independent variables allowed in the
models that you fit. This limit is initially set to 400,
and you can increase it up to 11,000.

You reset the limits by using the

**set processors** *#*
**set maxvar** *#* [**,** __perm__**anently**]
**set matsize** *#* [**,** __perm__**anently**]

commands. For instance, you might type

**. set processors 4**
**. set maxvar 6000**
**. set matsize 900**

The order in which you set the limits does not matter. If you specify
the **permanently** option for **maxvar** or **matsize**, in addition to making the
change for the present session, Stata/MP will remember the new limit and
use it in the future when you invoke Stata/MP:

**. set maxvar 6000, permanently**
**. set matsize 900, permanently**

You can reset the present or permanent limits whenever and as often as
you wish. Option **permanently** may not be specified with **set** **processors**.

__2.1 Advice on setting processors__

**set processors** *#*
You may set the number of processors to be used to any number up to the
lessor of (1) the number of cores on your computer and (2) the number of
cores licensed. You may even set **processors** to 1, and then Stata/MP is
effectively identical to Stata/SE.

In general, you will get the best performance by using all processors
available, leaving **processors** set to the default. If you are running a
large Stata job in the background, however, you may want to reduce the
maximum number that Stata/MP will use to have better performance in your
foreground tasks. If you are running two large Stata jobs in the
background, you may get slightly better performance if you restrict each
to using half the number of processors.

__2.2 Advice on setting maxvar__

**set maxvar** *#* [**,** __perm__**anently**] 2,048 <= *#* <= 120,000

Why is there a limit on **maxvar**? Why not just set **maxvar** to 120,000 and
be done with it? Because simply allowing room for variables, even if
they do not exist, consumes memory, and if you will be using only
datasets with a lot fewer variables, you will be wasting memory.

For instance, if you set **maxvar** to 20,000, you would consume
approximately 14 more megabytes than if you left **maxvar** at the default.
If you set **maxvar** to 120,000, you would consume a bit over 100 more
megabytes than if you left **maxvar** at the default.

**Recommendation**: Think about datasets with the most variables that
you typically use. Set **maxvar** to a few hundred or even 1,000 above
that. (The memory cost of an extra 1,000 variables is about 1 MB.)

**Remember**, you can always reset **maxvar** temporarily by typing **set**
**maxvar** *#*.

__2.3 Advice on setting matsize__

**set matsize** *#* [**,** __perm__**anently**] 10 <= *#* <= 11,000

The name **matsize** is unfortunate because it suggests something that is
only partially true. It suggests that the maximum size of matrices is
**matsize** *x* **matsize**. **matsize**, however, is irrelevant for the size of
matrices in Mata, Stata's modern matrix-programming language. Regardless
of the value of **matsize**, Mata matrices be larger or smaller than that.

**matsize** specifies the maximum size of matrices in Stata's old matrix
language -- and that is not of great importance -- and it specifies the
maximum number of variables that may appear in Stata's estimation
commands -- and that is important. A better name for **matsize** would be
**modelsize**.

With that introduction, let us begin.

Although **matsize** can theoretically be set up to 11,000, on all but the
largest 64-bit computers you will be unable to do that, and even if you
succeeded, Stata/MP would probably run out of memory. The value of
**matsize** has a dramatic effect on memory usage, the formula being

Number of megabytes = (8***matsize**^2 + 88***matsize**)/(1024^2)

For instance,

+--------------------------+
| **matsize** | Memory use |
|-----------+--------------|
| 400 | 1.254M |
| 800 | 4.950M |
| 1,600 | 19.666M |
| 3,200 | 78.394M |
| 6,400 | 313.037M |
| 11,000 | 924.080M |
+--------------------------+

The formula, in fact, understates the amount of memory certain Stata
commands use and understates what you will use yourself if you use
Stata's old matrix language matrices directly. The formula gives the
amount of memory required for one matrix and 11 vectors. If two matrices
are required, the numbers above are nearly doubled. When you **set**
**matsize**, Stata will refuse if you specify too large a value, but remember
that even if Stata does not complain, you still may run into problems
later. Stata might be running some statistical command and then
complain, "op. sys. refuses to provide memory; r(909)".

For **matsize**=11,000, nearly 1 GB of memory is required, and doubling that
would require nearly 2 GB of memory. On most 32-bit computers, 2 GB is
the most memory that the operating system will allocate to one task, so
nearly nothing would be left for the rest of Stata.

Why, then, is **matsize** allowed to be set so large? Because on 64-bit
computers, such large amounts cause no difficulty.

For reasonable values of **matsize** (say, up to 3,200), memory consumption
is not too great. Choose a reasonable value given the kinds of models
you fit, and remember that you can always reset the value.

__3. Sharing .dta datasets with non-MP users__

You may share datasets with Stata/SE and Stata/IC users as long as your
dataset does not have more variables than are allowed in those flavors of
Stata; see limits.

__4. Querying memory usage__

The command

**. memory**

will display the current memory report and the command

**. query memory**

will display the current memory settings. See help memory.

__5. Advice to programmers__

__5.1 Determining flavor__

Programmers can determine which flavor of Stata is running by examining
the creturn values

creturn values
| **c(flavor) c(SE) c(MP)**
------------+------------------------------
Stata/IC | "**IC**" 0 0
Stata/SE | "**IC**" 1 0
Stata/MP | "**IC**" 1 1
-------------------------------------------

__5.2 Avoid macro shift in program loops__

**macro shift** has negative performance implications when used with variable
lists containing 20,000 or more variables. We recommend avoiding the use
of **macro shift** in loops and instead using either **foreach** or "double
indirection". Double indirection means referring to **``i''** when **`i'**
contains a number 1, 2, ....