Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
From | Stas Kolenikov <skolenik@gmail.com> |
To | statalist@hsphsun2.harvard.edu |
Subject | Re: st: create unique random number variable |
Date | Tue, 24 Apr 2012 13:06:12 -0500 |
This is sampling without replacement from a uniform distribution on the discrete set {0, 1, 2, ..., 999}. Or at least I hope it is; sampling without replacement is a much more difficult topic than it seems. I just wanted to do everything in place and avoid -merge-. If you are fine with using -merge-, then you can simply set obs 1000 generate int id = _n sample `samplesize', count merge _n 1:1 using `source_data' and, in a way, this is the most straightforward solution. On Tue, Apr 24, 2012 at 12:19 PM, Joerg Luedicke <joerg.luedicke@gmail.com> wrote: > Stas, > > Just out of curiosity: could following this approach still be > described as a strictly random draw (of course, 'strictly' in terms of > pseudo-randomness) from a uniform distribution? Because what > essentially happens is that the randomly emerging ties are filled in > with yet another draw from the uniform. As a consequence, the > resulting integers are drawn from a mixture of several or many uniform > distributions. The component probabilities itself then depend on > randomly emerging ties, so it should not make much different in > practice. However, the resulting distribution looks somewhat smoother > than one might expect (due to being a mixture of k uniforms, I > presume). Compare the following histograms before and after the > redraws (for which I modified your code): > > > //draw from uniform (0,1) > clear > set obs 1000000 > set seed 1234 > generate uu =runiform() > hist uu, name(unif, replace) bin(1000) > > //mapped to integers > clear > set obs 1000000 > set seed 1234 > generate uu = int(1500000*uniform()) > bysort uu: generate byte nonuniq = _n > 1 > hist uu, name(g0, replace) bin(1000) > > //drawing again in case of ties > sum nonuniq > while r(max) > 0 { > bysort uu: replace uu = int(1500000*uniform()) if _n > 1 > bysort uu: replace nonuniq = _n > 1 > sum nonuniq, mean > } > hist uu, name(g1, replace) bin(1000) > > So I don't know what OP's demands are with regard to 'randomness', but > maybe this could matter in some applications? (Perhaps in rocket > science :) ) > > J. > > > On Tue, Apr 24, 2012 at 7:43 AM, Stas Kolenikov <skolenik@gmail.com> wrote: >> On Tue, Apr 24, 2012 at 4:37 AM, raoul reulen <r.c.reulen@gmail.com> wrote: >>> Hello >>> >>> I'm trying to generate a random number variable like this: >>> >>> .set seed 12345 >>> .gen x = int(1000*uniform()) >>> >>> However, the random numbers in variable x are not unique. Is there a >>> way to ensure they are unique? >> >> clear >> set obs 400 >> * this is your sample size >> >> generate uu = int(1000*uniform()) >> bysort uu: generate byte nonuniq = _n > 1 >> sum nonuniq, mean >> while r(max) > 0 { >> bysort uu: replace uu = int(1000*uniform()) if _n > 1 >> bysort uu: replace nonuniq = _n > 1 >> sum nonuniq, mean >> } >> drop nonuniq >> >> -- >> Stas Kolenikov, also found at http://stas.kolenikov.name >> Small print: I use this email account for mailing lists only. >> * >> * For searches and help try: >> * http://www.stata.com/help.cgi?search >> * http://www.stata.com/support/statalist/faq >> * http://www.ats.ucla.edu/stat/stata/ > > * > * For searches and help try: > * http://www.stata.com/help.cgi?search > * http://www.stata.com/support/statalist/faq > * http://www.ats.ucla.edu/stat/stata/ -- Stas Kolenikov, also found at http://stas.kolenikov.name Small print: I use this email account for mailing lists only. * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/