Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

st: Re: Random start to random number sequence

From   "Allan Reese (Cefas)" <>
To   <>
Subject   st: Re: Random start to random number sequence
Date   Fri, 20 Aug 2010 11:12:39 +0100

A few replies to Bill Gould's comments, to clarify my position.

BG: This next will surprise you until you think about it, but the best
way to use a pseudo-random-number generator is to set the seed only
once, the day you get it, and to just let it continue on its merry way
until you've used it up!

AR: Agree - see below.

BG: Because you set the seed only once, we do not need to discuss
randomness.  Randomness is a property of sequences of numbers.

AR: Disagree, but this is philosophy not Stata.  Randomness is a
property of the generating mechanism, which we never in practice know.
A recent letter in Nature ("Random numbers certified by Bell's theorem",
Pironio et al 15/Apr/2010) was way over my head but pointed out the
problem of knowing whether "random" numbers were simply being fed to you
by an intelligence that was choosing them.  P&friends used two
quantum-entangled atoms separated by approximately one metre.

On a finite computer, the Pseudo-RNG generates a sequence of bit
patterns, interpreted as binary numbers, such that "short" sequences of
the numbers *pass a series of tests of apparent randomness*.  Eventually
the series must get back to the opening value, and will cycle.  When the
cycle length is billions, "short" will easily cover sequences of a few

Knuth (Art of computer programming) shows that generally if you have a
good PRNG then any attempt to make it "more" random will introduce
non-randomness.  Bill made the same point, and the archived messages
that prompted me to write include those that suggest resetting the seed
within a simulation loop. NO! It's better to just continue the PNG

BG: We will still recommend you set the seed randomly, however, because
we will want randomness in numbers generated across
researchers.  Most pseudo-random-number generator designers would prefer
it if you used their generators in this way.

AR: Disagree slightly, as the point is just to use a different seed each
session *unless you wish to reproduce a particular subset of the PRN
cycle*.  As [D]generate states, "Without loss of pseudorandomness, the
seed may be set to small numbers."  So a good solution would be to save
the "code" at the end of each session and use that as the seed for the
next session. Or save an incrementing integer "mysessionnumber" to be
the seed at the start of the next session.

I like the way Stata generates PRNs, but the warning that the seed is
reset to 123456789 is easily overlooked.  If you are running simulations
to generate Monte-Carlo results, this probably does not matter.  On the
day, I wanted to generate a random sequence for randomizing allocation
of treatments, and here it might cause a problem.  I might, for example,
always end up allocating the control treatment or the extreme treatment
to the same physical location. (think numbered plots in a field)

BG: So what was wrong with Allan's original suggestion?  Allen based the
seed on the time of day.  Let's say Allan gets to the office around the
same time every day.

AR: BG has not experienced the traffic improvements taking place in
Weymouth as preparation for the Olympic sailing in 2012! Travel time is
not predictable.  Nor am I.

BG: Let's assume Allan runs simulations around the same time on days he
runs them.  Perhaps he starts them right after lunch, or just before
going home.  Alan is now drawing seeds in close proximity to each other.

AR: But as [D]generate notes, n and n+1 as seeds will give very
different starting points in the PNG cycle.

BG: He is trusting H() to jumble that for him.

AR: Wrong, as I set the seed only once in a session. 

BG: Moreover, he is drawing from such a reduced set that over a period
of time, Allan is likely to choose the same seed!

AR: I wrote, "... you can use the system clock which changes every
second. This will not make the subsequent sequence any more (or less)
random, but will make each session unique."  I almost wrote "almost
certainly unique" but thought that was pedantry.  Let's assume I run
Stata most days at work and often exit and restart.  Say, 500 sessions a
year and I've been using Stata for 25 years.  Let's also assume there
are times of the day I'm unlikely to be running Stata.  That suggests
maybe 12x60x60 (=43200) options for the clock time and 12500 occasions I
might have set the seed.  Like with the "birthday problem", you may be
surprised at the number of repeats, but they don't matter.  In practice,
I'm doing simulation or generating RN tables in a small proportion of
sessions.  Using the date+time and dropping the trailing 000 (caused by
rounding the millisecond system clock to whole second) is, however,

Yours, using a non-random selection from Bill's spellings

This email and any attachments are intended for the named recipient only.  Its unauthorised use, distribution, disclosure, storage or copying is not permitted.  If you have received it in error, please destroy all copies and notify the sender.  In messages of a non-business nature, the views and opinions expressed are the author's own and do not necessarily reflect those of the organisation from which it is sent.  All emails may be subject to monitoring.

*   For searches and help try:

© Copyright 1996–2017 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index