Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: Default Seed of Stata 12


From   Sergiy Radyakin <[email protected]>
To   "[email protected]" <[email protected]>
Subject   Re: st: Default Seed of Stata 12
Date   Fri, 4 Oct 2013 14:42:56 -0400

Dear Statalist and StataCorp,

In this question I would like to resurrect the discussion of the
inner-workings of the random number generator of Stata and the
correspondence between the 'seed' and the 'state'. Previous discussion
ended with the comprehensive description from Bill Gould, in which he
described that while there is a way of initializing the RND generator
with 32-bit values, it only allows to set up a small number of states
(compared to all states the generator might have). Setting the state
directly is possible, and the values to set the state are the same as
reported by c(seed). Since the message was posted a while ago, here is
a link to the archived version:
http://www.stata.com/statalist/archive/2012-10/msg01129.html

The original poster that asked the question was curious about working
of Stata v12.1 specifically. However, I am more curious about all
Stata versions. My findings are that while reported values of 'state'
are true for Stata 12.1, they are different in e.g. 12.0 and other
versions. Moreover, they seem to fluctuate both between different
actual versions of Stata (installation of 12.0 vs installation of
12.1) and between virtual versions of Stata (installation of 12.1 with
version set to 12.1 vs. installation of 12.1 vs. version set to 12.0),
and across (Installation of 12.0 vs installation of 12.1 with version
set to 12.0).

Here is the output from actual Stata 12.1 with different version
settings after setting the RND to default number 123456789:

http://radyakin.org/statalist/2013100401/states.txt
http://radyakin.org/statalist/2013100401/statesfull.png
    (with colors for differing values)

do http://radyakin.org/statalist/2013100401/seedstates.do

Interestingly, despite the different 'state' the actual random number
returned by uniform() after initialization is identical to the value
provided by Bill Gould: 0.1369.. (see the do file). However the
subsequent states also differ, and still the next random numbers also
appear to be the same.

Suppose now that my program rather than to 'set seed -number-' is
using the alternative syntax 'set seed -state-'. Moreover, for various
reasons it is restricted to this syntax with the state, rather than
the number. I am concerned about the reproducibility of the results
obtained with such a program executed in different versions of Stata.
Specifically, the program is doing bootstrap, and while the random
numbers as returned by -uniform()- seem to be identical, I can't be
totally sure. In the case of bootstrap, if it randomizes the samples
differently - the differences could escalate to much more notable
figures. By first attempts to diagnose this indicate that the results
of the bootstrap appear to be stable, but I didn't run it across the
real installations yet, only across virtual versions.

The specific questions that I need to resolve are the following:

1) Suppose I store the RND state with every iteration (whether it is
iteration of bootstrap or just a call to uniform()), and reproduce it
later in a different version of Stata. Would that guarantee that I
would get exactly the same results?

2) Are all the states obtained in arbitrary version X of Stata always
acceptable in version Y of Stata (X!=Y)?

Thank you,
    Sergiy Radyakin






On Wed, Oct 24, 2012 at 12:13 PM, William Gould, StataCorp LP
<[email protected]> wrote:
> Rasool Bux <[email protected]> asked,
>
>> Can anybody tell me the default system values i.e. seed etc.
>> of Stata 12.1
>
> The random-number seed is set to 123456789 each time Stata is launched.
> As Maarten Buis <[email protected]> noted, the value changes during
> the Stata session as you use the random-number generators.
>
>
> More information
> ----------------
>
> I wrote this response mainly so I could say, "123456789", but
> Maarten also wrote,
>
>> The default can change during a Stata session.
>>
>> You can see the current value of the seed by typing di c(seed).
>> See -help creturn- for this and other system values.
>> Also see -help set seed- for an explanation what that weird string
>> returned by -c(seed)- actually is.
>
> and now I feel obligated to provide more details than you will find in
> the manuals.  So for those who are curious:
>
> The random-number generator has something called a state.  When you
> -set seed-, you are specifying the state.  Each time you ask for
> a random number, say by using the -runiform()- function, the
> state is recursively updated -- new_state = f(current_state) -- and
> then a random number is produced based on the value of new_state.
> The code works like this:
>
>        random_number:
>               new_state     = f(current_state)
>               random_number = g(new_state)
>               current_state = new_state
>               return(random_number)
>
> Now here's what's interesting:  The state has more bits than the
> random number.  In the case of the KISS random number generator, the
> random numbers produced are 32 bit values, and the state is a 128 bit
> value!  Having more bits for the state than the random number is a
> general property of random-number generators and not just a property
> of KISS.
>
> When you set the seed, say by typing
>
>        . set seed 123456789
>
> you are setting the value of current_state.  A number like 123456789
> is a 32-bit value.  Somehow, that 32-bit value is converted to
> a 128-bit value and, no matter how we do it, obviously state can
> take on only one of 2^32 values.
>
> The seeting of the sed works like this:
>
>        set_seed_32_bit_value:
>               current_state = h(32_bit_value)
>               burn in current_state by repeating 100 times {
>                      produce random number (and throw it away)
>               }
>
> Maarten mentioned -c(seed)- and a second syntax of seed which allows
> you to specify the full state.  Let me explain.
>
> First off, -c(seed)- is a misleading name because it is not the seed,
> it is the state, which is related to the seed.  -c(seed)- after setting
> the seed to the 32-bit value 123456789 looks like this,
>
>        . set seed 123456789
>
>        . display c(seed)
>        X075bcd151f123bb5159a55e50022865700043e55
>
> The strange looking X075bcd151f123bb5159a55e50022865700043e55 is one
> way of writing the full 128-bit value.  X0765...55 is the result
> of running set_seed_32_bit_value on the 32-bit number 123456789.
>
> Remember that the state is updated each time a random number is
> generated.  Let's look at the state value after generating a random
> number:
>
>        . * we have already set seed 12345678
>
>        . display runiform()
>        .13698408
>
>        . display c(seed)
>        X5b15215854f24767556efaba82801d9b0004330a
>
> Think of the random-number generator as producing an infinitely long
> sequence of states:
>
>
>     -------------------------------------------------------------------------
>     state0 -> state1 -> state2 -> ... -> state{2^124} -> state0 -> state1 ...
>
>     where,
>
>        state0 = X075bcd151f123bb5159a55e50022865700043e55,
>
>        state1 = X5b15215854f24767556efaba82801d9b0004330a,
>
>     and so on,
>
>     and where the i-th pseudo random number is given by g(state{i}).
>     -------------------------------------------------------------------------
>
> The sequence may be infinitely long, but it repeats.  The period is
> approximately 2^124 in the case of KISS.
>
>
> The easy-to-type 32-bit seed provides 2^32 entry points into this sequence
>
>    ---------------------------------------------------------------------
>     state0 -> state1 -> ... -> state{2^96) -> ... -> state{2^124) -> ...
>       |                             |                     |
>   123456789                     ????????               ??????
>    ---------------------------------------------------------------------
>
> I put ?????? in the above because I didn't bother to work out
> the 32-bit numeric values corresponding to the particular states.
> What's important is the function state = h(32_bit_seed) is
> designed to space the entry points approximately equally.
> Also important to understand is that, because the sequence is
> infinitely long, my numbering of the states is arbitrary.
> I could have picked any one of the 2^124+1 states and labeled it 0.
>
> What's important is that the 32-bit seed provides an entry point
> into this sequence.  In the last experiment we tried,
>
>        . set seed 123456789
>
>        . display runiform()
>        .13698408
>
>        . display c(seed)
>        X5b15215854f24767556efaba82801d9b0004330a
>
> There is no 32-bit seed that you could set that corresponds to that
> state.
>
> And that is why the value of -c(seed)- looks so strange:  It provides
> every possible entry point into the sequence, whereas -set seed #-
> provides merely a subset.
>
> Do I have to say it?  If this kind of thing interests you, consider a
> career at StataCorp.
>
> -- Bill
> [email protected]
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/faqs/resources/statalist-faq/
> *   http://www.ats.ucla.edu/stat/stata/
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index