Bookmark and Share

Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down on April 23, and its replacement, statalist.org is already up and running.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: RE: Random start to random number sequence


From   Stas Kolenikov <skolenik@gmail.com>
To   statalist@hsphsun2.harvard.edu
Subject   Re: st: RE: Random start to random number sequence
Date   Fri, 17 Sep 2010 16:18:01 -0500

That was an interesting thread that I missed entirely, but I thought
I'd chime in with my own practice... I wonder if WG and other people
have any comments on it.

On Tue, Aug 17, 2010 at 4:43 PM, William Gould, StataCorp LP
<wgould@stata.com> wrote:
> Is resetting the seed a good idea?
> ----------------------------------
>
> That very simple question has a complicated answer.

I often run my simulations on a server on which I can start about 30
processes. To fully automate this, I create a hierarchy of do-files.
Each particular instance of a Monte Carlo data set is dealt with
-workhorse.do-: the data are created, I run my statistical procedure,
and post the results. The -workhorse.do- is called from within
-envelope.do- that receives as arguments the random seed and the
number of repetitions. It has the structure like:

--- begin envelope.do ---
args seed reps

set seed `seed'

postfile topost <whatever> using mysimulation-`seed', replace every(1)
forvalues r=1/`reps' {
    do workhorse
    post topost <matching whatever>
}
postclose topost
exit
--- end envelope.do ---

Finally, I have -dispatcher.do- that creates the files that are
actually submitted to the computational cluster. It will look like
this:

--- begin dispatcher.do ---
args seed nthreads digits reps

set seed `seed'
forvalues k=1/`nthreads' {
   local theseed = floor( runiform()*9e`digits' + runiform()*1e`digits' )
   clear
   set obs 1
   gen str40 torun = "do envelope `theseed' `reps'"
   outfile using simul`k'.do, replace noquote
   ! OS call to stata -b simul`k'.do
}

exit
--- end dispatcher.do ---

Then I just call -dispatcher.do- with its arguments, and Stata forks
out a bunch of copies of itself running my simulation code. (Terrible
things may happen if Stata gets updated while my code runs... this
server still has Stata 10 on it though.)

Of course this a bare bones structure; in reality, I check whether all
arguments of -dispatcher.do- were submitted, and whether they are of
the expected type (I am guarding against myself, knowing how sloppy I
might be at times :) ); I might have more than one line in my
-simul`k'.do- files; -post-s can be relegated to -workhorse.do- files,
in which case I would also need to transfer the handle; etc.

The question still remains of the very first -set seed-, and the bill
or current date/time or fat book or any other method can be used for
that. However each instance of Stata receives its own seed, sets it
once and keeps it. So I sort of achieve a compromise between tweaking
the seed too often (I still want all my 30 threads to use different
seeds!) and keeping it constant within a simulation (and having some
reproducibility information -- in this case, in the name of the .dta
file). I am also aware that the output of -runiform()- is kinda
granular: there are 32K (?? I hope I am not making this up, but
somehow this figure stuck in my head) distinct values, thus populating
only 5 or so digits, and I would want to utilize two or more calls to
-runiform()- to get unique 8 digit numbers as my seeds/IDs.

-- 
Stas Kolenikov, also found at http://stas.kolenikov.name
Small print: I use this email account for mailing lists only.
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   Site index