Statalist


[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

Re: st: Random seed revisited - how to get random seeds more efficiently?


From   wgould@stata.com (William Gould, StataCorp LP)
To   statalist@hsphsun2.harvard.edu
Subject   Re: st: Random seed revisited - how to get random seeds more efficiently?
Date   Wed, 01 Oct 2008 10:28:58 -0500

Tiago V. Pereira <tiago.pereira@incor.usp.br> wrote, 

> I have used the following random seed generator:
> 
>         */---------------START--------------------
>         tokenize "`c(current_date)'" ,parse(" ")
>         local seed_1 "`1'"
>         tokenize "`c(current_time)'" ,parse(":")
>         local seed_2 "`1'`3'`5'"
>         local seed_final "`seed_1'`seed_2'"
>         set seed `seed_final'
>         */--------------END-----------------------
> 
> I suspect that for loops that take less than 1 second, I am using actually
> the same seed for 2-5 loops (it is just a guess). Do you know more
> effective (pseudo)random seed generators?

I'm confused by Tiago's question and I'm worried that it's not me that's
confused, but Tiago.  It sounds almost as if Tiago is resetting the seed
before generating each random number.  If so, Tiago does not want to do that.
Stata's random number generator is very good, but it is only very good if you
set the set once and then draw many random numbers.  Let me explain.

Let's consider a simulation that we are about to perform 1,000 times, 
and let's consider two ways to do that:  (1) set the seed once, and 
then just use uniform() 1,000 times, and (2) set the seed 1,000 times, 
and after each setting, get one random number using uniform(). 
Both methods produce 1,000 random numbers, they just go about it 
differently.

Method (1) will produce random numbers with lots of good properties. 

Method (2) will produce lousy random numbers.  

I exaggerate the problems with (2) because Stata goes to extra work after you
set the seed to make method (2) work better, but there are no guarantees and,
in 1000s of resettings, all bets are off.  Anyway, just to simplify the
conversation, let's ignore the extra work Stata goes to have you set the seed.
I'll come back to that later.

Tiago worried that he is actually using the same seed for 2-5 loops.  Right.
That is a real, and obvious, problem.  But even if that were not the case,
there are problems awaiting us.  We have to worry that, in two sequential
resettings, given that seeds that are nearly equal, the first random number
drawn will be correlated with the preivous iteration's first random number.

I am not criticising Triago's method for setting the seed. That there is a
pattern in the seeds produced by Triago's method is a property of all
computer methods of seed generation.  That is, it is a property unless the
computer has a source of true random numbers, such as circuit that produces
noise that is connected to an analog-to-digitial converter.  

All computer-based methods for generating seeds are lousy, and Tiago's is no
lousier than most.  If one could easily find true random numbers on the
computer, no one would bother developing mathematical pseudo-random number
generators.  It is fine to use any of these methods, including Tiago's, AS
LONG AS ONE DOES NOT USE IT TOO FREQUENTLY.

We might use Tiago's seed generator before running our 1,000 simulations. 
Now, however, let's pretend that we want to evaluate the 1,000-simulation
method, so we decide to run 10,000 simulations of the 1000-simulations.  
Using Tiago's seed generator in the 1,000-simulation would now be
inappropriate.  We would be worried that the sequences generated in each 
of the 1,000 simulations would be correlated.

We can use Tiago's seed generator once, however.  We could use it at the top
of the 10,000-simulations.  The right way to use any automated seed generator
is to set the seed once and then just use the software's random-number
generator.  If the authors of the software knew what they were doing, they
made sure that the random-number generator has a long period.  

Never use automated seed generators inside loops.

But what about reproducibility?  What if I am doing the 10,000-simulation of
the 1,000-simulations, and I need not only to be able to repeat the
10,000-simulation, I might later want to repeat any of the 1,000 simluations
in isolation.  Answer:

        . display "`c(seed)'"

The commands displays something like "X075bcd151f123bb5159a55e50022865746ad",
which doesn't look like a number, but the string contains all the information
necessary to reset the random-number generator to its current state after
burn-in.  

When you set a NUMERIC seed, the value you provide is used to set various
constants inside the random-number generator and then the random-number
generator is run a large number of times.  What you consider the first random
number after setting a seed is actually the (M+1)st random number.  One of the
reasons for burn in is to mitigate the effect of serial correlation and 
other patterns in the seeds you specify. 

Anyway, "X075bcd151f123bb5159a55e50022865746ad" contains all the information
and, should you ever need to repeat one of the simulations, you can type

       . set seed X075bcd151f123bb5159a55e50022865746ad

and Stata's random-number generator will return to the exact state it 
was in previously.

-- Bill
wgould@stata.com
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index