Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: Default Seed of Stata 12


From   "William Gould, StataCorp LP" <[email protected]>
To   [email protected]
Subject   Re: st: Default Seed of Stata 12
Date   Wed, 24 Oct 2012 11:13:38 -0500

Rasool Bux <[email protected]> asked, 

> Can anybody tell me the default system values i.e. seed etc. 
> of Stata 12.1

The random-number seed is set to 123456789 each time Stata is launched. 
As Maarten Buis <[email protected]> noted, the value changes during
the Stata session as you use the random-number generators. 


More information
----------------

I wrote this response mainly so I could say, "123456789", but 
Maarten also wrote, 

> The default can change during a Stata session. 
>
> You can see the current value of the seed by typing di c(seed). 
> See -help creturn- for this and other system values. 
> Also see -help set seed- for an explanation what that weird string
> returned by -c(seed)- actually is.

and now I feel obligated to provide more details than you will find in
the manuals.  So for those who are curious:

The random-number generator has something called a state.  When you 
-set seed-, you are specifying the state.  Each time you ask for 
a random number, say by using the -runiform()- function, the 
state is recursively updated -- new_state = f(current_state) -- and 
then a random number is produced based on the value of new_state. 
The code works like this:

       random_number: 
              new_state     = f(current_state)
              random_number = g(new_state)
              current_state = new_state
              return(random_number)

Now here's what's interesting:  The state has more bits than the
random number.  In the case of the KISS random number generator, the
random numbers produced are 32 bit values, and the state is a 128 bit
value!  Having more bits for the state than the random number is a 
general property of random-number generators and not just a property 
of KISS.

When you set the seed, say by typing 

       . set seed 123456789

you are setting the value of current_state.  A number like 123456789
is a 32-bit value.  Somehow, that 32-bit value is converted to 
a 128-bit value and, no matter how we do it, obviously state can 
take on only one of 2^32 values.

The seeting of the sed works like this:

       set_seed_32_bit_value:
              current_state = h(32_bit_value)
              burn in current_state by repeating 100 times {
                     produce random number (and throw it away)
              }

Maarten mentioned -c(seed)- and a second syntax of seed which allows 
you to specify the full state.  Let me explain. 

First off, -c(seed)- is a misleading name because it is not the seed,
it is the state, which is related to the seed.  -c(seed)- after setting
the seed to the 32-bit value 123456789 looks like this,

       . set seed 123456789

       . display c(seed)
       X075bcd151f123bb5159a55e50022865700043e55

The strange looking X075bcd151f123bb5159a55e50022865700043e55 is one
way of writing the full 128-bit value.  X0765...55 is the result
of running set_seed_32_bit_value on the 32-bit number 123456789.

Remember that the state is updated each time a random number is 
generated.  Let's look at the state value after generating a random 
number:

       . * we have already set seed 12345678

       . display runiform()
       .13698408

       . display c(seed)
       X5b15215854f24767556efaba82801d9b0004330a

Think of the random-number generator as producing an infinitely long
sequence of states:


    -------------------------------------------------------------------------
    state0 -> state1 -> state2 -> ... -> state{2^124} -> state0 -> state1 ...

    where, 

       state0 = X075bcd151f123bb5159a55e50022865700043e55, 

       state1 = X5b15215854f24767556efaba82801d9b0004330a, 

    and so on, 

    and where the i-th pseudo random number is given by g(state{i}). 
    -------------------------------------------------------------------------

The sequence may be infinitely long, but it repeats.  The period is
approximately 2^124 in the case of KISS.


The easy-to-type 32-bit seed provides 2^32 entry points into this sequence

   ---------------------------------------------------------------------
    state0 -> state1 -> ... -> state{2^96) -> ... -> state{2^124) -> ...
      |                             |                     |
  123456789                     ????????               ??????
   ---------------------------------------------------------------------

I put ?????? in the above because I didn't bother to work out 
the 32-bit numeric values corresponding to the particular states.
What's important is the function state = h(32_bit_seed) is 
designed to space the entry points approximately equally. 
Also important to understand is that, because the sequence is 
infinitely long, my numbering of the states is arbitrary.
I could have picked any one of the 2^124+1 states and labeled it 0. 

What's important is that the 32-bit seed provides an entry point 
into this sequence.  In the last experiment we tried,

       . set seed 123456789

       . display runiform() 
       .13698408

       . display c(seed)
       X5b15215854f24767556efaba82801d9b0004330a

There is no 32-bit seed that you could set that corresponds to that
state.

And that is why the value of -c(seed)- looks so strange:  It provides
every possible entry point into the sequence, whereas -set seed #-
provides merely a subset.

Do I have to say it?  If this kind of thing interests you, consider a 
career at StataCorp.

-- Bill
[email protected]
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index