Bookmark and Share

Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down on April 23, and its replacement, statalist.org is already up and running.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: Default Seed of Stata 12


From   Henrik Støvring <STOVRING@biostat.au.dk>
To   "statalist@hsphsun2.harvard.edu" <statalist@hsphsun2.harvard.edu>
Subject   Re: st: Default Seed of Stata 12
Date   Thu, 25 Oct 2012 09:20:08 +0000

Thanks for this clear, fascinating and very detailed presentation of the 
inner workings of Stata! I am not about to apply for a position at 
Stata, as you suggest interested readers should consider to do, but I 
must admit that every time you do one of these valuable pieces on the 
inner mechanics of Stata I feel a little more in doubt on whether I 
should reconsider. :-)

Best,

Henrik

On 10/24/2012 06:13 PM, William Gould, StataCorp LP wrote:
> Rasool Bux <rasool.bux@aku.edu> asked,
>
>> Can anybody tell me the default system values i.e. seed etc.
>> of Stata 12.1
> The random-number seed is set to 123456789 each time Stata is launched.
> As Maarten Buis <maartenlbuis@gmail.com> noted, the value changes during
> the Stata session as you use the random-number generators.
>
>
> More information
> ----------------
>
> I wrote this response mainly so I could say, "123456789", but
> Maarten also wrote,
>
>> The default can change during a Stata session.
>>
>> You can see the current value of the seed by typing di c(seed).
>> See -help creturn- for this and other system values.
>> Also see -help set seed- for an explanation what that weird string
>> returned by -c(seed)- actually is.
> and now I feel obligated to provide more details than you will find in
> the manuals.  So for those who are curious:
>
> The random-number generator has something called a state.  When you
> -set seed-, you are specifying the state.  Each time you ask for
> a random number, say by using the -runiform()- function, the
> state is recursively updated -- new_state = f(current_state) -- and
> then a random number is produced based on the value of new_state.
> The code works like this:
>
>         random_number:
>                new_state     = f(current_state)
>                random_number = g(new_state)
>                current_state = new_state
>                return(random_number)
>
> Now here's what's interesting:  The state has more bits than the
> random number.  In the case of the KISS random number generator, the
> random numbers produced are 32 bit values, and the state is a 128 bit
> value!  Having more bits for the state than the random number is a
> general property of random-number generators and not just a property
> of KISS.
>
> When you set the seed, say by typing
>
>         . set seed 123456789
>
> you are setting the value of current_state.  A number like 123456789
> is a 32-bit value.  Somehow, that 32-bit value is converted to
> a 128-bit value and, no matter how we do it, obviously state can
> take on only one of 2^32 values.
>
> The seeting of the sed works like this:
>
>         set_seed_32_bit_value:
>                current_state = h(32_bit_value)
>                burn in current_state by repeating 100 times {
>                       produce random number (and throw it away)
>                }
>
> Maarten mentioned -c(seed)- and a second syntax of seed which allows
> you to specify the full state.  Let me explain.
>
> First off, -c(seed)- is a misleading name because it is not the seed,
> it is the state, which is related to the seed.  -c(seed)- after setting
> the seed to the 32-bit value 123456789 looks like this,
>
>         . set seed 123456789
>
>         . display c(seed)
>         X075bcd151f123bb5159a55e50022865700043e55
>
> The strange looking X075bcd151f123bb5159a55e50022865700043e55 is one
> way of writing the full 128-bit value.  X0765...55 is the result
> of running set_seed_32_bit_value on the 32-bit number 123456789.
>
> Remember that the state is updated each time a random number is
> generated.  Let's look at the state value after generating a random
> number:
>
>         . * we have already set seed 12345678
>
>         . display runiform()
>         .13698408
>
>         . display c(seed)
>         X5b15215854f24767556efaba82801d9b0004330a
>
> Think of the random-number generator as producing an infinitely long
> sequence of states:
>
>
>      -------------------------------------------------------------------------
>      state0 -> state1 -> state2 -> ... -> state{2^124} -> state0 -> state1 ...
>
>      where,
>
>         state0 = X075bcd151f123bb5159a55e50022865700043e55,
>
>         state1 = X5b15215854f24767556efaba82801d9b0004330a,
>
>      and so on,
>
>      and where the i-th pseudo random number is given by g(state{i}).
>      -------------------------------------------------------------------------
>
> The sequence may be infinitely long, but it repeats.  The period is
> approximately 2^124 in the case of KISS.
>
>
> The easy-to-type 32-bit seed provides 2^32 entry points into this sequence
>
>     ---------------------------------------------------------------------
>      state0 -> state1 -> ... -> state{2^96) -> ... -> state{2^124) -> ...
>        |                             |                     |
>    123456789                     ????????               ??????
>     ---------------------------------------------------------------------
>
> I put ?????? in the above because I didn't bother to work out
> the 32-bit numeric values corresponding to the particular states.
> What's important is the function state = h(32_bit_seed) is
> designed to space the entry points approximately equally.
> Also important to understand is that, because the sequence is
> infinitely long, my numbering of the states is arbitrary.
> I could have picked any one of the 2^124+1 states and labeled it 0.
>
> What's important is that the 32-bit seed provides an entry point
> into this sequence.  In the last experiment we tried,
>
>         . set seed 123456789
>
>         . display runiform()
>         .13698408
>
>         . display c(seed)
>         X5b15215854f24767556efaba82801d9b0004330a
>
> There is no 32-bit seed that you could set that corresponds to that
> state.
>
> And that is why the value of -c(seed)- looks so strange:  It provides
> every possible entry point into the sequence, whereas -set seed #-
> provides merely a subset.
>
> Do I have to say it?  If this kind of thing interests you, consider a
> career at StataCorp.
>
> -- Bill
> wgould@stata.com
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/faqs/resources/statalist-faq/
> *   http://www.ats.ucla.edu/stat/stata/
>

-- 

*Henrik Støvring, PhD*
Associate professor
stovring@biostat.au.dk
Phone +45 8716 7991
Fax +45 8716 7305
Web: au.dk/en/stovring@biostat <http://au.dk/en/stovring@biostat>

	

Department of Public Health
Biostatistics
University of Aarhus
Bartholins Allé 2, Bldg 1261, 217
DK-8000 Aarhus C
Denmark


Department of Public Health, Aarhus University

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   Site index