Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
From | Sergiy Radyakin <serjradyakin@gmail.com> |
To | "statalist@hsphsun2.harvard.edu" <statalist@hsphsun2.harvard.edu> |
Subject | Re: st: Default Seed of Stata 12 |
Date | Fri, 4 Oct 2013 14:42:56 -0400 |
Dear Statalist and StataCorp, In this question I would like to resurrect the discussion of the inner-workings of the random number generator of Stata and the correspondence between the 'seed' and the 'state'. Previous discussion ended with the comprehensive description from Bill Gould, in which he described that while there is a way of initializing the RND generator with 32-bit values, it only allows to set up a small number of states (compared to all states the generator might have). Setting the state directly is possible, and the values to set the state are the same as reported by c(seed). Since the message was posted a while ago, here is a link to the archived version: http://www.stata.com/statalist/archive/2012-10/msg01129.html The original poster that asked the question was curious about working of Stata v12.1 specifically. However, I am more curious about all Stata versions. My findings are that while reported values of 'state' are true for Stata 12.1, they are different in e.g. 12.0 and other versions. Moreover, they seem to fluctuate both between different actual versions of Stata (installation of 12.0 vs installation of 12.1) and between virtual versions of Stata (installation of 12.1 with version set to 12.1 vs. installation of 12.1 vs. version set to 12.0), and across (Installation of 12.0 vs installation of 12.1 with version set to 12.0). Here is the output from actual Stata 12.1 with different version settings after setting the RND to default number 123456789: http://radyakin.org/statalist/2013100401/states.txt http://radyakin.org/statalist/2013100401/statesfull.png (with colors for differing values) do http://radyakin.org/statalist/2013100401/seedstates.do Interestingly, despite the different 'state' the actual random number returned by uniform() after initialization is identical to the value provided by Bill Gould: 0.1369.. (see the do file). However the subsequent states also differ, and still the next random numbers also appear to be the same. Suppose now that my program rather than to 'set seed -number-' is using the alternative syntax 'set seed -state-'. Moreover, for various reasons it is restricted to this syntax with the state, rather than the number. I am concerned about the reproducibility of the results obtained with such a program executed in different versions of Stata. Specifically, the program is doing bootstrap, and while the random numbers as returned by -uniform()- seem to be identical, I can't be totally sure. In the case of bootstrap, if it randomizes the samples differently - the differences could escalate to much more notable figures. By first attempts to diagnose this indicate that the results of the bootstrap appear to be stable, but I didn't run it across the real installations yet, only across virtual versions. The specific questions that I need to resolve are the following: 1) Suppose I store the RND state with every iteration (whether it is iteration of bootstrap or just a call to uniform()), and reproduce it later in a different version of Stata. Would that guarantee that I would get exactly the same results? 2) Are all the states obtained in arbitrary version X of Stata always acceptable in version Y of Stata (X!=Y)? Thank you, Sergiy Radyakin On Wed, Oct 24, 2012 at 12:13 PM, William Gould, StataCorp LP <wgould@stata.com> wrote: > Rasool Bux <rasool.bux@aku.edu> asked, > >> Can anybody tell me the default system values i.e. seed etc. >> of Stata 12.1 > > The random-number seed is set to 123456789 each time Stata is launched. > As Maarten Buis <maartenlbuis@gmail.com> noted, the value changes during > the Stata session as you use the random-number generators. > > > More information > ---------------- > > I wrote this response mainly so I could say, "123456789", but > Maarten also wrote, > >> The default can change during a Stata session. >> >> You can see the current value of the seed by typing di c(seed). >> See -help creturn- for this and other system values. >> Also see -help set seed- for an explanation what that weird string >> returned by -c(seed)- actually is. > > and now I feel obligated to provide more details than you will find in > the manuals. So for those who are curious: > > The random-number generator has something called a state. When you > -set seed-, you are specifying the state. Each time you ask for > a random number, say by using the -runiform()- function, the > state is recursively updated -- new_state = f(current_state) -- and > then a random number is produced based on the value of new_state. > The code works like this: > > random_number: > new_state = f(current_state) > random_number = g(new_state) > current_state = new_state > return(random_number) > > Now here's what's interesting: The state has more bits than the > random number. In the case of the KISS random number generator, the > random numbers produced are 32 bit values, and the state is a 128 bit > value! Having more bits for the state than the random number is a > general property of random-number generators and not just a property > of KISS. > > When you set the seed, say by typing > > . set seed 123456789 > > you are setting the value of current_state. A number like 123456789 > is a 32-bit value. Somehow, that 32-bit value is converted to > a 128-bit value and, no matter how we do it, obviously state can > take on only one of 2^32 values. > > The seeting of the sed works like this: > > set_seed_32_bit_value: > current_state = h(32_bit_value) > burn in current_state by repeating 100 times { > produce random number (and throw it away) > } > > Maarten mentioned -c(seed)- and a second syntax of seed which allows > you to specify the full state. Let me explain. > > First off, -c(seed)- is a misleading name because it is not the seed, > it is the state, which is related to the seed. -c(seed)- after setting > the seed to the 32-bit value 123456789 looks like this, > > . set seed 123456789 > > . display c(seed) > X075bcd151f123bb5159a55e50022865700043e55 > > The strange looking X075bcd151f123bb5159a55e50022865700043e55 is one > way of writing the full 128-bit value. X0765...55 is the result > of running set_seed_32_bit_value on the 32-bit number 123456789. > > Remember that the state is updated each time a random number is > generated. Let's look at the state value after generating a random > number: > > . * we have already set seed 12345678 > > . display runiform() > .13698408 > > . display c(seed) > X5b15215854f24767556efaba82801d9b0004330a > > Think of the random-number generator as producing an infinitely long > sequence of states: > > > ------------------------------------------------------------------------- > state0 -> state1 -> state2 -> ... -> state{2^124} -> state0 -> state1 ... > > where, > > state0 = X075bcd151f123bb5159a55e50022865700043e55, > > state1 = X5b15215854f24767556efaba82801d9b0004330a, > > and so on, > > and where the i-th pseudo random number is given by g(state{i}). > ------------------------------------------------------------------------- > > The sequence may be infinitely long, but it repeats. The period is > approximately 2^124 in the case of KISS. > > > The easy-to-type 32-bit seed provides 2^32 entry points into this sequence > > --------------------------------------------------------------------- > state0 -> state1 -> ... -> state{2^96) -> ... -> state{2^124) -> ... > | | | > 123456789 ???????? ?????? > --------------------------------------------------------------------- > > I put ?????? in the above because I didn't bother to work out > the 32-bit numeric values corresponding to the particular states. > What's important is the function state = h(32_bit_seed) is > designed to space the entry points approximately equally. > Also important to understand is that, because the sequence is > infinitely long, my numbering of the states is arbitrary. > I could have picked any one of the 2^124+1 states and labeled it 0. > > What's important is that the 32-bit seed provides an entry point > into this sequence. In the last experiment we tried, > > . set seed 123456789 > > . display runiform() > .13698408 > > . display c(seed) > X5b15215854f24767556efaba82801d9b0004330a > > There is no 32-bit seed that you could set that corresponds to that > state. > > And that is why the value of -c(seed)- looks so strange: It provides > every possible entry point into the sequence, whereas -set seed #- > provides merely a subset. > > Do I have to say it? If this kind of thing interests you, consider a > career at StataCorp. > > -- Bill > wgould@stata.com > * > * For searches and help try: > * http://www.stata.com/help.cgi?search > * http://www.stata.com/support/faqs/resources/statalist-faq/ > * http://www.ats.ucla.edu/stat/stata/ * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/faqs/resources/statalist-faq/ * http://www.ats.ucla.edu/stat/stata/