Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

Re: st: Default Seed of Stata 12

 From "William Gould, StataCorp LP" To statalist@hsphsun2.harvard.edu Subject Re: st: Default Seed of Stata 12 Date Wed, 24 Oct 2012 11:13:38 -0500

```Rasool Bux <rasool.bux@aku.edu> asked,

> Can anybody tell me the default system values i.e. seed etc.
> of Stata 12.1

The random-number seed is set to 123456789 each time Stata is launched.
As Maarten Buis <maartenlbuis@gmail.com> noted, the value changes during
the Stata session as you use the random-number generators.

----------------

I wrote this response mainly so I could say, "123456789", but
Maarten also wrote,

> The default can change during a Stata session.
>
> You can see the current value of the seed by typing di c(seed).
> See -help creturn- for this and other system values.
> Also see -help set seed- for an explanation what that weird string
> returned by -c(seed)- actually is.

and now I feel obligated to provide more details than you will find in
the manuals.  So for those who are curious:

The random-number generator has something called a state.  When you
-set seed-, you are specifying the state.  Each time you ask for
a random number, say by using the -runiform()- function, the
state is recursively updated -- new_state = f(current_state) -- and
then a random number is produced based on the value of new_state.
The code works like this:

random_number:
new_state     = f(current_state)
random_number = g(new_state)
current_state = new_state
return(random_number)

Now here's what's interesting:  The state has more bits than the
random number.  In the case of the KISS random number generator, the
random numbers produced are 32 bit values, and the state is a 128 bit
value!  Having more bits for the state than the random number is a
general property of random-number generators and not just a property
of KISS.

When you set the seed, say by typing

. set seed 123456789

you are setting the value of current_state.  A number like 123456789
is a 32-bit value.  Somehow, that 32-bit value is converted to
a 128-bit value and, no matter how we do it, obviously state can
take on only one of 2^32 values.

The seeting of the sed works like this:

set_seed_32_bit_value:
current_state = h(32_bit_value)
burn in current_state by repeating 100 times {
produce random number (and throw it away)
}

Maarten mentioned -c(seed)- and a second syntax of seed which allows
you to specify the full state.  Let me explain.

First off, -c(seed)- is a misleading name because it is not the seed,
it is the state, which is related to the seed.  -c(seed)- after setting
the seed to the 32-bit value 123456789 looks like this,

. set seed 123456789

. display c(seed)
X075bcd151f123bb5159a55e50022865700043e55

The strange looking X075bcd151f123bb5159a55e50022865700043e55 is one
way of writing the full 128-bit value.  X0765...55 is the result
of running set_seed_32_bit_value on the 32-bit number 123456789.

Remember that the state is updated each time a random number is
generated.  Let's look at the state value after generating a random
number:

. * we have already set seed 12345678

. display runiform()
.13698408

. display c(seed)
X5b15215854f24767556efaba82801d9b0004330a

Think of the random-number generator as producing an infinitely long
sequence of states:

-------------------------------------------------------------------------
state0 -> state1 -> state2 -> ... -> state{2^124} -> state0 -> state1 ...

where,

state0 = X075bcd151f123bb5159a55e50022865700043e55,

state1 = X5b15215854f24767556efaba82801d9b0004330a,

and so on,

and where the i-th pseudo random number is given by g(state{i}).
-------------------------------------------------------------------------

The sequence may be infinitely long, but it repeats.  The period is
approximately 2^124 in the case of KISS.

The easy-to-type 32-bit seed provides 2^32 entry points into this sequence

---------------------------------------------------------------------
state0 -> state1 -> ... -> state{2^96) -> ... -> state{2^124) -> ...
|                             |                     |
123456789                     ????????               ??????
---------------------------------------------------------------------

I put ?????? in the above because I didn't bother to work out
the 32-bit numeric values corresponding to the particular states.
What's important is the function state = h(32_bit_seed) is
designed to space the entry points approximately equally.
Also important to understand is that, because the sequence is
infinitely long, my numbering of the states is arbitrary.
I could have picked any one of the 2^124+1 states and labeled it 0.

What's important is that the 32-bit seed provides an entry point
into this sequence.  In the last experiment we tried,

. set seed 123456789

. display runiform()
.13698408

. display c(seed)
X5b15215854f24767556efaba82801d9b0004330a

There is no 32-bit seed that you could set that corresponds to that
state.

And that is why the value of -c(seed)- looks so strange:  It provides
every possible entry point into the sequence, whereas -set seed #-
provides merely a subset.

Do I have to say it?  If this kind of thing interests you, consider a
career at StataCorp.

-- Bill
wgould@stata.com
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/
```