Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
From | "Seed, Paul" <paul.seed@kcl.ac.uk> |
To | "statalist@hsphsun2.harvard.edu" <statalist@hsphsun2.harvard.edu> |
Subject | st: inconsistent random numbers even using -set seed- |
Date | Wed, 22 Jan 2014 11:02:19 +0000 |
Dear Statalist, I spent several hours yesterday trying to deal with a program that gave Inconsistent results when re-run. Eventually I tracked it down. As I have never seen this discussed before, I thought it was worth sharing. A change to the manual might even be called for. Here is how it looks: *************************** * Example code showing problem * version 11.2 set more off sysuse auto, clear bys rep78: su price mpg set seed 1234 gen rand = runiform() bys rep78 (rand) : keep if _n <= _N/2 bys rep78: su price mpg * End example * *********************** If you run this code repeatedly, you will find you do not get the same answers to the second list of summaries. After much trouble I found that the inconsistency depends on the sort order. The uniform() function is producing exactly the same set of pseudorandom numbers each time, and putting them into record 1, 2, 3, 4... as they are produced. However, sorting by rep78 is only a _partial_ sort. The order within each value of rep78 is determined arbitrarily by some internal Stata process, and changes each time. So record 1, 2, 3, 4... are not the same each time. To get consistent results, a complete sort is needed. In this case we can use the fact that each make of car appears once only. I can use either bys rep78 (make): su price mpg or bys rep78: su price mpg sort make The results will be different depending which I choose, but they will not vary from run to run. *************************** * Example code showing solution 1* version 11.2 set more off sysuse auto, clear * Crucial change here bys rep78 (make): su price mpg set seed 1234 gen rand = runiform() bys rep78 (rand) : keep if _n <= _N/2 bys rep78: su price mpg * End example * *********************** * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/faqs/resources/statalist-faq/ * http://www.ats.ucla.edu/stat/stata/