Notice: On March 31, it was **announced** that Statalist is moving from an email list to a **forum**. The old list will shut down at the end of May, and its replacement, **statalist.org** is already up and running.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

From |
Stas Kolenikov <skolenik@gmail.com> |

To |
statalist@hsphsun2.harvard.edu |

Subject |
Re: st: RE: Random start to random number sequence |

Date |
Fri, 17 Sep 2010 16:18:01 -0500 |

That was an interesting thread that I missed entirely, but I thought I'd chime in with my own practice... I wonder if WG and other people have any comments on it. On Tue, Aug 17, 2010 at 4:43 PM, William Gould, StataCorp LP <wgould@stata.com> wrote: > Is resetting the seed a good idea? > ---------------------------------- > > That very simple question has a complicated answer. I often run my simulations on a server on which I can start about 30 processes. To fully automate this, I create a hierarchy of do-files. Each particular instance of a Monte Carlo data set is dealt with -workhorse.do-: the data are created, I run my statistical procedure, and post the results. The -workhorse.do- is called from within -envelope.do- that receives as arguments the random seed and the number of repetitions. It has the structure like: --- begin envelope.do --- args seed reps set seed `seed' postfile topost <whatever> using mysimulation-`seed', replace every(1) forvalues r=1/`reps' { do workhorse post topost <matching whatever> } postclose topost exit --- end envelope.do --- Finally, I have -dispatcher.do- that creates the files that are actually submitted to the computational cluster. It will look like this: --- begin dispatcher.do --- args seed nthreads digits reps set seed `seed' forvalues k=1/`nthreads' { local theseed = floor( runiform()*9e`digits' + runiform()*1e`digits' ) clear set obs 1 gen str40 torun = "do envelope `theseed' `reps'" outfile using simul`k'.do, replace noquote ! OS call to stata -b simul`k'.do } exit --- end dispatcher.do --- Then I just call -dispatcher.do- with its arguments, and Stata forks out a bunch of copies of itself running my simulation code. (Terrible things may happen if Stata gets updated while my code runs... this server still has Stata 10 on it though.) Of course this a bare bones structure; in reality, I check whether all arguments of -dispatcher.do- were submitted, and whether they are of the expected type (I am guarding against myself, knowing how sloppy I might be at times :) ); I might have more than one line in my -simul`k'.do- files; -post-s can be relegated to -workhorse.do- files, in which case I would also need to transfer the handle; etc. The question still remains of the very first -set seed-, and the bill or current date/time or fat book or any other method can be used for that. However each instance of Stata receives its own seed, sets it once and keeps it. So I sort of achieve a compromise between tweaking the seed too often (I still want all my 30 threads to use different seeds!) and keeping it constant within a simulation (and having some reproducibility information -- in this case, in the name of the .dta file). I am also aware that the output of -runiform()- is kinda granular: there are 32K (?? I hope I am not making this up, but somehow this figure stuck in my head) distinct values, thus populating only 5 or so digits, and I would want to utilize two or more calls to -runiform()- to get unique 8 digit numbers as my seeds/IDs. -- Stas Kolenikov, also found at http://stas.kolenikov.name Small print: I use this email account for mailing lists only. * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

- Prev by Date:
**st: RE: "Repeated-measures" form of linear regression?** - Next by Date:
**st: saving coefficients for "absorb" variable for areg command** - Previous by thread:
**st: "Repeated-measures" form of linear regression?** - Next by thread:
**st: saving coefficients for "absorb" variable for areg command** - Index(es):