Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: inconsistent random numbers even using -set seed-


From   daniel klein <[email protected]>
To   "[email protected]" <[email protected]>
Subject   Re: st: inconsistent random numbers even using -set seed-
Date   Wed, 22 Jan 2014 12:37:02 +0100

Paul,

I think you have a point here. Note however, that the behavior you
describe is documented -- although arguably not where it should be.

In -help sort- the -stable- option is explained to do exactly what you
are looking for here, which you call a 'complete sort'.

This does not necessarily call for inclusion of another variable (with
unique values) like make in the auto dataset (although internally it
probably does) if you do not insist on using -bysort-.

In your example change this

bys rep78: su price mpg

to that

so rep78 ,stable
su price mpg

and find that results do not change from run to run.

I agree that this should be documented in the help file for -bysort-
or even beter yet, -bys- should be changed to support the -stable-
option of -sort-.

Best
Daniel

-- 
I spent several hours yesterday trying to deal with a program that gave
Inconsistent results when re-run.

Eventually I tracked it down.
As I have never seen this discussed before,  I thought it was worth sharing.

A change to the manual might even be called for.


Here is how it looks:

***************************
* Example code showing problem *

version 11.2
set more off
sysuse auto, clear

bys rep78: su  price mpg

set seed 1234
gen rand = runiform()

bys rep78 (rand) : keep if _n <= _N/2
bys rep78: su  price mpg

* End example *
***********************

If you run this code repeatedly, you will find you do not get
the same answers to the second list of summaries.

[...]
However, sorting by rep78 is only a _partial_ sort.
The order within each value of rep78 is determined arbitrarily by some
internal Stata process, and changes each time. So record 1, 2, 3, 4... are
not the same each time.

To get consistent results, a complete sort is needed.
In this case we can use the fact that each make of car appears once only.
I can use either
bys rep78 (make): su  price mpg

or
bys rep78: su  price mpg
sort make

The results will be different depending which I choose, but
they will not vary from run to run.
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index