»  Home »  Products »  Features »  Stream random-number generators

Stream random-number generators

Highlights

• Use bootstrap in parallel on multiple computers
• Use simulate and run Monte Carlo simulations in parallel on multiple computers
• Simultaneously draw random numbers from up to 32,768 separate instances of Stata

Bootstraps and Monte Carlo simulation use random numbers to perform the same calculations over and over again. So do some other statistical procedures. With a little organization of your work, you can perform these kinds of calculations simultaneously on different computers. The problem is generating the random numbers. If you are running on 12 different computers, are you going to use twelve different seeds? Even that will not guarantee correct use of pseudorandom-number generators.

Stream random-number generators solve this problem. You set one seed and specify stream 1 on the first computer, stream 2 on the second, and so on.

Stata provides a stream version of the 64-bit Mersenne Twister, Stata's default pseudorandom-number generator.

What's the problem?

Computer random numbers are elements in a sequence of deterministic numbers that only appear to be random. A seed specifies an entry point into this sequence. See figure 1. Each tick is an element in the sequence—a "random" number. Setting the seed to 12345 means that the tick identified by the arrow will be the next "random" number drawn.

Figure 1. Seed specifies first number in random sequence

When using ordinary (serial) random-number generators, there is no way to specify different seeds that ensure the corresponding random samples drawn from the sequence do not overlap. You cannot simply run different bootstrap or Monte Carlo draws over different computers using serial random-number generators.

Stream random-number generators solve this problem by partitioning the sequence into nonoverlapping subsequences known as streams, as shown in figure 2.

Figure 2. A stream version of figure 1 generator
This figure is slightly misleading; Stata's implementation creates 32,768 streams, not 4.

Setting the seed to 12345 for the stream random-number generator enters at the same place as previously. The stream random-number generator, however, also partitions the sequences into 32,768 subsequences.

When you use Stata's stream random-number generator, you specify a seed and a stream number.

Let's see it work

To draw numbers from the stream Mersenne Twister random-number generator, set the stream and set the seed:

. set rngstream 10

. set seed 123456


After that, use Stata's runiform() function—or any of its other random-number functions—just as you ordinarily would:

. generate u = runiform()


Or use Stata's bootstrap or simulate functions, which automate obtaining bootstrap standard errors and Monte Carlo simulations.

We created two do-files:


------------------------------------- file1.do ---
set rngstream 1
set seed 12345
sysuse auto
bootstrap, reps(100) saving(machine1, replace):
regress mpg weight gear foreign
--------------------------------------------------



------------------------------------- file2.do ---
set rngstream 2
set seed 12345
sysuse auto
bootstrap, reps(100) saving(machine2, replace):
regress mpg weight gear foreign
--------------------------------------------------



The do-files are nearly identical. One says stream 1 and machine1 and the other says stream 2 and machine2. Using Stata's programming features, we could have written just one do-file.

We ran file1.do on computer 1 and file2.do on computer 2.

We copied the resulting dataset, machine2.dta, from computer 2 to computer 1, on which we already had machine1.dta.

And now, we obtain our combined results:

. clear all

. use machine1
(bootstrap: regress)

. append using machine2

. bstat

Bootstrap results                               Number of obs     =         74
Replications      =        200

command:  regress mpg weight gear foreign

Observed   Bootstrap                         Normal-based
Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]

weight     -.006139   .0005678   -10.81   0.000    -.0072519   -.0050262
gear_ratio     1.457113   1.266586     1.15   0.250     -1.02535    3.939577
foreign    -2.221682   1.187847    -1.87   0.061    -4.549819    .1064562
_cons     36.10135   4.562644     7.91   0.000     27.15873    45.04397


For computationally intensive problems, the two-machine time will be about one-half the one-machine time. Using distinct streams on different computers can dramatically reduce the time required for computationally intensive problems.