Bookmark and Share

Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down on April 23, and its replacement, statalist.org is already up and running.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: repeatedly shuffle number sequence


From   Nick Cox <njcoxstata@gmail.com>
To   statalist@hsphsun2.harvard.edu
Subject   Re: st: repeatedly shuffle number sequence
Date   Tue, 25 Oct 2011 09:45:55 +0100

This sounds like the sort of problem in which you can spend more time
working out the most efficient way to do it than actually doing it.
You can answer your own question by timings with numbers of
observations and variables close to what you will be using. My own
instinct is to wonder about creating a long dataset with one variable
divided into blocks and then finally doing a -reshape wide- but these
days -sort-s are pretty fast in Stata unless your dataset is enormous.

Nick

On Tue, Oct 25, 2011 at 9:08 AM, Clinton Thompson
<clintonjthompson@gmail.com> wrote:

> I'm using Stata/SE 11.2 for WIndows.
>
> This is a question that is part programming, part efficiency, and part
> style.  Consider a sequence of numbers, say [1,10], that I want to
> shuffle/randomize several times such that I end up w/ k variables
> where each of the variables created contains a random shuffling of the
> values [1,10].  I approached this using a rather simple and
> rudimentary -foreach- loop:
>
>>>>>>>>>>>>>> BEGIN >>>>>>>>>>>
>
> clear
> set obs 10
> set seed 20111025
>
> foreach num of numlist 1/5 {
>  gen int seq`num' = _n
>  gen rand`num' = runiform()
>  sort rand`num'
>  drop rand`num'
> }
>
> <<<<<<<<<< END <<<<<<<<<<<<<
>
> This approach works -- in the sense that k variables are created where
> each variable contains a random shuffling of the numbers from 1-10 --
> but I'm not sure if this the best way to approach this kind of
> problem.  Does the creation of a -wide- dataset (as in my approach)
> make the most sense (I'll be expanding this to 20-25 variables instead
> of the 5 currently programmed)?  And I can easily change the sequences
> of the values for all of the seq* variables depending on which of the
> rand* variables is sorted but this doesn't seem too robust.  Any
> thoughts or advice on whether this is the best (read:  correct and
> most efficient?) approach to this problem is most appreciated.

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   Site index