Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: create unique random number variable


From   Stas Kolenikov <[email protected]>
To   [email protected]
Subject   Re: st: create unique random number variable
Date   Tue, 24 Apr 2012 13:06:12 -0500

This is sampling without replacement from a uniform distribution on
the discrete set {0, 1, 2, ..., 999}. Or at least I hope it is;
sampling without replacement is a much more difficult topic than it
seems. I just wanted to do everything in place and avoid -merge-. If
you are fine with using -merge-, then you can simply

set obs 1000
generate int id = _n
sample `samplesize', count
merge _n 1:1 using `source_data'

and, in a way, this is the most straightforward solution.

On Tue, Apr 24, 2012 at 12:19 PM, Joerg Luedicke
<[email protected]> wrote:
> Stas,
>
> Just out of curiosity: could following this approach still be
> described as a strictly random draw (of course, 'strictly' in terms of
> pseudo-randomness) from a uniform distribution? Because what
> essentially happens is that the randomly emerging ties are filled in
> with yet another draw from the uniform. As a consequence, the
> resulting integers are drawn from a mixture of several or many uniform
> distributions. The component probabilities itself then depend on
> randomly emerging ties, so it should not make much different in
> practice. However, the resulting distribution looks somewhat smoother
> than one might expect (due to being a mixture of k uniforms, I
> presume). Compare the following histograms before and after the
> redraws (for which I modified your code):
>
>
> //draw from uniform (0,1)
> clear
> set obs 1000000
> set seed 1234
> generate uu =runiform()
> hist uu, name(unif, replace) bin(1000)
>
> //mapped to integers
> clear
> set obs 1000000
> set seed 1234
> generate uu = int(1500000*uniform())
> bysort uu: generate byte nonuniq = _n > 1
> hist uu, name(g0, replace) bin(1000)
>
> //drawing again in case of ties
> sum nonuniq
> while r(max) > 0 {
> bysort uu: replace uu = int(1500000*uniform()) if _n > 1
> bysort uu: replace nonuniq = _n > 1
> sum nonuniq, mean
> }
> hist uu, name(g1, replace) bin(1000)
>
> So I don't know what OP's demands are with regard to 'randomness', but
> maybe this could matter in some applications? (Perhaps in rocket
> science :)  )
>
> J.
>
>
> On Tue, Apr 24, 2012 at 7:43 AM, Stas Kolenikov <[email protected]> wrote:
>> On Tue, Apr 24, 2012 at 4:37 AM, raoul reulen <[email protected]> wrote:
>>> Hello
>>>
>>> I'm trying to generate a random number variable like this:
>>>
>>> .set seed 12345
>>> .gen x = int(1000*uniform())
>>>
>>> However, the random numbers in variable x are not unique. Is there a
>>> way to ensure they are unique?
>>
>> clear
>> set obs 400
>> * this is your sample size
>>
>> generate uu = int(1000*uniform())
>> bysort uu: generate byte nonuniq = _n > 1
>> sum nonuniq, mean
>> while r(max) > 0 {
>> bysort uu: replace uu = int(1000*uniform()) if _n > 1
>> bysort uu: replace nonuniq = _n > 1
>> sum nonuniq, mean
>> }
>> drop nonuniq
>>
>> --
>> Stas Kolenikov, also found at http://stas.kolenikov.name
>> Small print: I use this email account for mailing lists only.
>> *
>> *   For searches and help try:
>> *   http://www.stata.com/help.cgi?search
>> *   http://www.stata.com/support/statalist/faq
>> *   http://www.ats.ucla.edu/stat/stata/
>
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/statalist/faq
> *   http://www.ats.ucla.edu/stat/stata/



-- 
Stas Kolenikov, also found at http://stas.kolenikov.name
Small print: I use this email account for mailing lists only.

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index