[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

From |
wgould@stata.com (William Gould, StataCorp LP) |

To |
statalist@hsphsun2.harvard.edu |

Subject |
Re: st: Random seed revisited - how to get random seeds more efficiently? |

Date |
Wed, 01 Oct 2008 10:28:58 -0500 |

Tiago V. Pereira <tiago.pereira@incor.usp.br> wrote, > I have used the following random seed generator: > > */---------------START-------------------- > tokenize "`c(current_date)'" ,parse(" ") > local seed_1 "`1'" > tokenize "`c(current_time)'" ,parse(":") > local seed_2 "`1'`3'`5'" > local seed_final "`seed_1'`seed_2'" > set seed `seed_final' > */--------------END----------------------- > > I suspect that for loops that take less than 1 second, I am using actually > the same seed for 2-5 loops (it is just a guess). Do you know more > effective (pseudo)random seed generators? I'm confused by Tiago's question and I'm worried that it's not me that's confused, but Tiago. It sounds almost as if Tiago is resetting the seed before generating each random number. If so, Tiago does not want to do that. Stata's random number generator is very good, but it is only very good if you set the set once and then draw many random numbers. Let me explain. Let's consider a simulation that we are about to perform 1,000 times, and let's consider two ways to do that: (1) set the seed once, and then just use uniform() 1,000 times, and (2) set the seed 1,000 times, and after each setting, get one random number using uniform(). Both methods produce 1,000 random numbers, they just go about it differently. Method (1) will produce random numbers with lots of good properties. Method (2) will produce lousy random numbers. I exaggerate the problems with (2) because Stata goes to extra work after you set the seed to make method (2) work better, but there are no guarantees and, in 1000s of resettings, all bets are off. Anyway, just to simplify the conversation, let's ignore the extra work Stata goes to have you set the seed. I'll come back to that later. Tiago worried that he is actually using the same seed for 2-5 loops. Right. That is a real, and obvious, problem. But even if that were not the case, there are problems awaiting us. We have to worry that, in two sequential resettings, given that seeds that are nearly equal, the first random number drawn will be correlated with the preivous iteration's first random number. I am not criticising Triago's method for setting the seed. That there is a pattern in the seeds produced by Triago's method is a property of all computer methods of seed generation. That is, it is a property unless the computer has a source of true random numbers, such as circuit that produces noise that is connected to an analog-to-digitial converter. All computer-based methods for generating seeds are lousy, and Tiago's is no lousier than most. If one could easily find true random numbers on the computer, no one would bother developing mathematical pseudo-random number generators. It is fine to use any of these methods, including Tiago's, AS LONG AS ONE DOES NOT USE IT TOO FREQUENTLY. We might use Tiago's seed generator before running our 1,000 simulations. Now, however, let's pretend that we want to evaluate the 1,000-simulation method, so we decide to run 10,000 simulations of the 1000-simulations. Using Tiago's seed generator in the 1,000-simulation would now be inappropriate. We would be worried that the sequences generated in each of the 1,000 simulations would be correlated. We can use Tiago's seed generator once, however. We could use it at the top of the 10,000-simulations. The right way to use any automated seed generator is to set the seed once and then just use the software's random-number generator. If the authors of the software knew what they were doing, they made sure that the random-number generator has a long period. Never use automated seed generators inside loops. But what about reproducibility? What if I am doing the 10,000-simulation of the 1,000-simulations, and I need not only to be able to repeat the 10,000-simulation, I might later want to repeat any of the 1,000 simluations in isolation. Answer: . display "`c(seed)'" The commands displays something like "X075bcd151f123bb5159a55e50022865746ad", which doesn't look like a number, but the string contains all the information necessary to reset the random-number generator to its current state after burn-in. When you set a NUMERIC seed, the value you provide is used to set various constants inside the random-number generator and then the random-number generator is run a large number of times. What you consider the first random number after setting a seed is actually the (M+1)st random number. One of the reasons for burn in is to mitigate the effect of serial correlation and other patterns in the seeds you specify. Anyway, "X075bcd151f123bb5159a55e50022865746ad" contains all the information and, should you ever need to repeat one of the simulations, you can type . set seed X075bcd151f123bb5159a55e50022865746ad and Stata's random-number generator will return to the exact state it was in previously. -- Bill wgould@stata.com * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

- Prev by Date:
**Re: st: Verify randomization in a large sample** - Next by Date:
**RE: st: Verify randomization in a large sample** - Previous by thread:
**st: Random seed revisited - how to get random seeds more efficiently?** - Next by thread:
**st: test for clustering** - Index(es):

© Copyright 1996–2023 StataCorp LLC | Terms of use | Privacy | Contact us | What's new | Site index |