Statalist The Stata Listserver


[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

Re: st: Generating Random Number


From   Joseph Coveney <[email protected]>
To   Statalist <[email protected]>
Subject   Re: st: Generating Random Number
Date   Wed, 24 Jan 2007 11:58:50 +0900

I wrote:

local quit = 1
while (`quit') {
   generate double randu`quit' = uniform()
   sort randu`quit', stable
   capture assert randu`quit' > randu`quit'[_n-1] in 2/l
   if _rc `++quit'
   else continue, break
}

--------------------------------------------------------------------------------

Now I remember:  the sorting on random numbers needed to be hierarchical in
order to assure that the iterations would eventually end, especially with
the large dataset.  What I ended up with was something more akin to

clear
set memory 100M
set obs `=2e6'
set seed `=date("2007-01-24", "ymd")'
generate long surrogate_id = _n
generate byte duplicates = 1
local pass 1
while (`pass') {
   generate double randu`pass' = uniform() if duplicates
   sort randu*, stable
   replace duplicates = 0
   replace duplicates = (randu`pass' == randu`pass'[_n-1]) ///
     if !mi(randu`pass') & _n > 1
   capture assert duplicates == 0
   if _rc {
       replace duplicates = 1 if (duplicates[_n + 1] == 1)
       local pass = `pass' + 1
   }
   else continue, break
}
drop randu* duplicates
display in smcl as text "Number of passes: " as result `pass'
exit

This example (two million rows) takes two passes even with double-precision
random-number variables.

All this effort to explicitly rerandomize duplicate random numbers arose
when it seemed that "randomized" in Stata's documentation for -sort ,
stable- meant more "haphazard" and less "in a reproducible pseudorandom
sequence." (See the example below typed from the keyboard.)  It might be
that -sort-'s randomization runs off a different seed.  In any event, an
observation like the one below threw me, and I resorted to hierarchical
randomization in order to assure myself unambiguous reproducibility of the
sequence.

Joseph Coveney

. clear

.
. set more off

.
. set seed 1234567890

.
. set obs 20
obs was 0, now 20

.
. generate byte id = _n

.
. generate double randu = uniform()

.
. replace randu = randu[1] in 2
(1 real change made)

.
. sort randu

.
. list id if inrange(id, 1, 2)

    +----+
    | id |
    |----|
 5. |  2 |
 6. |  1 |
    +----+

.
. sort id

.
. set seed 1234567890

.
. replace randu = uniform()
(1 real change made)

.
. replace randu = randu[1] in 2
(1 real change made)

.
. sort randu

.
. list id if inrange(id, 1, 2)

    +----+
    | id |
    |----|
 5. |  1 |
 6. |  2 |
    +----+

.

*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2024 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index