Stata: Data Analysis and Statistical Software

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: Default Seed of Stata 12

From	Stas Kolenikov <[email protected]>
To	[email protected]
Subject	Re: st: Default Seed of Stata 12
Date	Fri, 26 Oct 2012 13:44:43 -0500

Pretty much every number is as good as any other number for a starting
value. Stata quietly cycles 100 states to get away from your starting
value, anyway. The various pieces of advice I've seen regarding the
starting values (including probably some statalist discussions)
include:

1. use today's date: set seed 20121026
2. pull a bill out of your pocket, and copy its numbers
3. take a look at your RSA key and use the digits from there
(sh-h-h... I hope my IT department is not listening to this)
4. use an actual random number from random.org
5. use a Dilbert-like random number generator
(http://dilbert.com/strips/comic/2001-10-25/)

The way I typically set up my simulations is to have a workhorse file
that takes something like

args n seed eye_color hair_color
log using simulation-`c(current_date)'-`n'-`seed'-`eye_color'-`hair_color'

where `n' would be the number of observations to create, `seed' is
obviously the random seed, and the rest are the parameters of the data
generation process. I would try it with an obviously human produced
parameters like

do workhorse 111 10101 orange purple

(provided, of course, that my file will know what to do with these
parameters), and then for my actual simulation on a cluster, I would
produce a wrapper

===
args seed
foreach n of numlist 100 200 500 {
  foreach eye in green blue brown {
    foreach hair in blond black brown {
      do workhorse `n' `seed' `eye' `hair'
    }
  }
}
===

and then create a few dozen single line do-files with just "do wrapper
<seed>"; I would produce them automatically with -file- command, and
even launch them with the OS execution utility. So if I bothered too
much about the seeds, I would never be able to set this up
computationally efficiently :).

-- 
-- Stas Kolenikov, PhD, PStat (SSC)  ::  http://stas.kolenikov.name
-- Senior Survey Statistician, Abt SRBI  ::  work email kolenikovs at
srbi dot com
-- Opinions stated in this email are mine only, and do not reflect the
position of my employer

On Fri, Oct 26, 2012 at 1:25 PM,  <[email protected]> wrote:
> Bill Gould wrote a very informative post about Stata's seed on Wed 24
> Oct.
>
> In part, he wrote:
> ==========================
> Think of the random-number generator as producing an infinitely long
> sequence of states:
>
> ------------------------------------------------------------------------
> -
>     state0 -> state1 -> state2 -> ... -> state{2^124} -> state0 ->
> state1 ...
>
>     where,
>
>        state0 = X075bcd151f123bb5159a55e50022865700043e55,
>
>        state1 = X5b15215854f24767556efaba82801d9b0004330a,
>
>     and so on,
>
>     and where the i-th pseudo random number is given by g(state{i}).
>
> ------------------------------------------------------------------------
> -
>
> The sequence may be infinitely long, but it repeats.  The period is
> approximately 2^124 in the case of KISS.
>
> The easy-to-type 32-bit seed provides 2^32 entry points into this
> sequence
>
>    ---------------------------------------------------------------------
>     state0 -> state1 -> ... -> state{2^96) -> ... -> state{2^124) -> ...
>       |                             |                     |
>   123456789                     ????????               ??????
>    ---------------------------------------------------------------------
> ========================
>
> Given the "infinitely long" sequence which repeats, and Bill's reference
> to "entry points", does it ever matter what number one chooses to be the
> initial seed and hence enters the sequence?
>
> I note that the default Stata 32-bit seed is "123456789", which is 9
> digits and an odd number. Are there potentially adverse consequences of
> setting a 32-bit seed using an even number? Or using a seed that is less
> than some critical number of digits in length?  E.g. is "1" or "20" as
> good as "123456789" or "987654321"?
>
> Many people, including me, appear to use a number with -set seed- that
> has a relatively large number of digits and is an odd number -- but I
> wonder if this is simply custom and practice, or whether there is a
> rationale. Or is any number as good as another as an entry point to the
> sequence?  I searched the web for answers a while ago and did not find
> answers.
>
>
> Stephen
> ------------------
> Professor Stephen P. Jenkins <[email protected]>
> Department of Social Policy and STICERD
> London School of Economics and Political Science
> Houghton Street, London WC2A 2AE, UK
> Tel: +44(0)20 7955 6527
> Changing Fortunes: Income Mobility and Poverty Dynamics in Britain, OUP
> 2011, http://ukcatalogue.oup.com/product/9780199226436.do
> Survival Analysis Using Stata:
> http://www.iser.essex.ac.uk/survival-analysis
> Downloadable papers and software: http://ideas.repec.org/e/pje7.html
>
>
>
> Please access the attached hyperlink for an important electronic communications disclaimer: http://lse.ac.uk/emailDisclaimer
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/

References:
- st: Default Seed of Stata 12
  - From: <[email protected]>

Prev by Date: Re: st: Thread-Index: Ac2zpwBrnny9GG5FQQ6Vz7stZfQOCw==
Next by Date: Re: st: Subtract Closest Cell Which has A Value
Previous by thread: st: Default Seed of Stata 12
Next by thread: Re: st: Default Seed of Stata 12
Index(es):
- Date
- Thread