Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
st: inconsistent random numbers even using -set seed-
From
"Seed, Paul" <[email protected]>
To
"[email protected]" <[email protected]>
Subject
st: inconsistent random numbers even using -set seed-
Date
Wed, 22 Jan 2014 11:02:19 +0000
Dear Statalist,
I spent several hours yesterday trying to deal with a program that gave
Inconsistent results when re-run.
Eventually I tracked it down.
As I have never seen this discussed before, I thought it was worth sharing.
A change to the manual might even be called for.
Here is how it looks:
***************************
* Example code showing problem *
version 11.2
set more off
sysuse auto, clear
bys rep78: su price mpg
set seed 1234
gen rand = runiform()
bys rep78 (rand) : keep if _n <= _N/2
bys rep78: su price mpg
* End example *
***********************
If you run this code repeatedly, you will find you do not get
the same answers to the second list of summaries.
After much trouble I found that the inconsistency depends on the sort order.
The uniform() function is producing exactly the same set of pseudorandom
numbers each time, and putting them into record 1, 2, 3, 4... as they are produced.
However, sorting by rep78 is only a _partial_ sort.
The order within each value of rep78 is determined arbitrarily by some
internal Stata process, and changes each time. So record 1, 2, 3, 4... are
not the same each time.
To get consistent results, a complete sort is needed.
In this case we can use the fact that each make of car appears once only.
I can use either
bys rep78 (make): su price mpg
or
bys rep78: su price mpg
sort make
The results will be different depending which I choose, but
they will not vary from run to run.
***************************
* Example code showing solution 1*
version 11.2
set more off
sysuse auto, clear
* Crucial change here
bys rep78 (make): su price mpg
set seed 1234
gen rand = runiform()
bys rep78 (rand) : keep if _n <= _N/2
bys rep78: su price mpg
* End example *
***********************
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/faqs/resources/statalist-faq/
* http://www.ats.ucla.edu/stat/stata/