Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
From | Henrik Støvring <STOVRING@biostat.au.dk> |
To | "statalist@hsphsun2.harvard.edu" <statalist@hsphsun2.harvard.edu> |
Subject | Re: st: Default Seed of Stata 12 |
Date | Thu, 25 Oct 2012 09:20:08 +0000 |
Thanks for this clear, fascinating and very detailed presentation of the inner workings of Stata! I am not about to apply for a position at Stata, as you suggest interested readers should consider to do, but I must admit that every time you do one of these valuable pieces on the inner mechanics of Stata I feel a little more in doubt on whether I should reconsider. :-) Best, Henrik On 10/24/2012 06:13 PM, William Gould, StataCorp LP wrote: > Rasool Bux <rasool.bux@aku.edu> asked, > >> Can anybody tell me the default system values i.e. seed etc. >> of Stata 12.1 > The random-number seed is set to 123456789 each time Stata is launched. > As Maarten Buis <maartenlbuis@gmail.com> noted, the value changes during > the Stata session as you use the random-number generators. > > > More information > ---------------- > > I wrote this response mainly so I could say, "123456789", but > Maarten also wrote, > >> The default can change during a Stata session. >> >> You can see the current value of the seed by typing di c(seed). >> See -help creturn- for this and other system values. >> Also see -help set seed- for an explanation what that weird string >> returned by -c(seed)- actually is. > and now I feel obligated to provide more details than you will find in > the manuals. So for those who are curious: > > The random-number generator has something called a state. When you > -set seed-, you are specifying the state. Each time you ask for > a random number, say by using the -runiform()- function, the > state is recursively updated -- new_state = f(current_state) -- and > then a random number is produced based on the value of new_state. > The code works like this: > > random_number: > new_state = f(current_state) > random_number = g(new_state) > current_state = new_state > return(random_number) > > Now here's what's interesting: The state has more bits than the > random number. In the case of the KISS random number generator, the > random numbers produced are 32 bit values, and the state is a 128 bit > value! Having more bits for the state than the random number is a > general property of random-number generators and not just a property > of KISS. > > When you set the seed, say by typing > > . set seed 123456789 > > you are setting the value of current_state. A number like 123456789 > is a 32-bit value. Somehow, that 32-bit value is converted to > a 128-bit value and, no matter how we do it, obviously state can > take on only one of 2^32 values. > > The seeting of the sed works like this: > > set_seed_32_bit_value: > current_state = h(32_bit_value) > burn in current_state by repeating 100 times { > produce random number (and throw it away) > } > > Maarten mentioned -c(seed)- and a second syntax of seed which allows > you to specify the full state. Let me explain. > > First off, -c(seed)- is a misleading name because it is not the seed, > it is the state, which is related to the seed. -c(seed)- after setting > the seed to the 32-bit value 123456789 looks like this, > > . set seed 123456789 > > . display c(seed) > X075bcd151f123bb5159a55e50022865700043e55 > > The strange looking X075bcd151f123bb5159a55e50022865700043e55 is one > way of writing the full 128-bit value. X0765...55 is the result > of running set_seed_32_bit_value on the 32-bit number 123456789. > > Remember that the state is updated each time a random number is > generated. Let's look at the state value after generating a random > number: > > . * we have already set seed 12345678 > > . display runiform() > .13698408 > > . display c(seed) > X5b15215854f24767556efaba82801d9b0004330a > > Think of the random-number generator as producing an infinitely long > sequence of states: > > > ------------------------------------------------------------------------- > state0 -> state1 -> state2 -> ... -> state{2^124} -> state0 -> state1 ... > > where, > > state0 = X075bcd151f123bb5159a55e50022865700043e55, > > state1 = X5b15215854f24767556efaba82801d9b0004330a, > > and so on, > > and where the i-th pseudo random number is given by g(state{i}). > ------------------------------------------------------------------------- > > The sequence may be infinitely long, but it repeats. The period is > approximately 2^124 in the case of KISS. > > > The easy-to-type 32-bit seed provides 2^32 entry points into this sequence > > --------------------------------------------------------------------- > state0 -> state1 -> ... -> state{2^96) -> ... -> state{2^124) -> ... > | | | > 123456789 ???????? ?????? > --------------------------------------------------------------------- > > I put ?????? in the above because I didn't bother to work out > the 32-bit numeric values corresponding to the particular states. > What's important is the function state = h(32_bit_seed) is > designed to space the entry points approximately equally. > Also important to understand is that, because the sequence is > infinitely long, my numbering of the states is arbitrary. > I could have picked any one of the 2^124+1 states and labeled it 0. > > What's important is that the 32-bit seed provides an entry point > into this sequence. In the last experiment we tried, > > . set seed 123456789 > > . display runiform() > .13698408 > > . display c(seed) > X5b15215854f24767556efaba82801d9b0004330a > > There is no 32-bit seed that you could set that corresponds to that > state. > > And that is why the value of -c(seed)- looks so strange: It provides > every possible entry point into the sequence, whereas -set seed #- > provides merely a subset. > > Do I have to say it? If this kind of thing interests you, consider a > career at StataCorp. > > -- Bill > wgould@stata.com > * > * For searches and help try: > * http://www.stata.com/help.cgi?search > * http://www.stata.com/support/faqs/resources/statalist-faq/ > * http://www.ats.ucla.edu/stat/stata/ > -- *Henrik Støvring, PhD* Associate professor stovring@biostat.au.dk Phone +45 8716 7991 Fax +45 8716 7305 Web: au.dk/en/stovring@biostat <http://au.dk/en/stovring@biostat> Department of Public Health Biostatistics University of Aarhus Bartholins Allé 2, Bldg 1261, 217 DK-8000 Aarhus C Denmark Department of Public Health, Aarhus University * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/faqs/resources/statalist-faq/ * http://www.ats.ucla.edu/stat/stata/