Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: st: Mata order() indeterminate
From 
 
[email protected] (Vince Wiggins, StataCorp) 
To 
 
[email protected] 
Subject 
 
Re: st: Mata order() indeterminate 
Date 
 
Mon, 04 Nov 2013 16:22:41 -0600 
Brendan Halpin <[email protected]> asks about obtaining stable
ordering (sorts) when there are tied values and when using the
-order()- function in Mata.
Brendan provides an example, see below, where the ordering of ties is
done differently each time the functions are run.
As Joseph Coveney <[email protected]> discusses, this randomness in
ordering tied values is intentional.  It keeps us from seeing
consistency where there is none.
That said, we have no problem with someone knowingly breaking ties
consistently.  A surefire solution is to create another column in your
ordering matrix that has a sequential ordering, then include that
column when using -order()-.  If you were ordering on the first column
of matrix y you would type -order(y, 1)-.  After inserting a second
column that is has sequential values you would type -order(y, 1..2)-.
That solution makes it explicit how you want your ties broken.
That said, there is another way.  Mata's -order()- and -sort()- use
Stata's internal sort seed, see -help sortseed-, to order ties
consistently.  Brendan can set that seed before using -order()- to get
reproducible sorts.  Just code
    : stata("set sortseed 12345")
We will consider exposing this approach in the documentation, and even
adding a -sortseed()- function to Mata.  There is, however, a clarity
in requiring that ties be broken explicitly.
 
-- Vince 
   [email protected]
 
 
------------ Original message ---------------------------
 
From: [email protected] (Brendan Halpin)
Subject: st: Mata order() indeterminate
Sender: [email protected]
Lines: 40
I'm checking a simulation in Mata, and find that setting the seed to the
same value does not yield the same results on repeated runs. I've
tracked it down to the use of -order()- to sort a matrix, where there
are many ties. It appears that -order()- brings in indeterminacy in
dealing with ties, but from somewhere other than the random-number system.
This snippet illustrates the issue:
mata:
x = range(1,10,1)
y = runiform(10,1):>0.5
for (i=1; i<=20; i++) {
  rseed(12345)
  x, x[order(y,1),]
}
end
Though x is unchanged, and the seed is set to the same value at each
pass, x[order(y,1)] changes. 
While this is disturbing, I presume it is consistent with Stata policy
with regard to sorting indeterminacy. 
How do I get repeatable sorting in this context?
Regards,
Brendan
-- 
Brendan Halpin, Head of Department, Sociology, University of Limerick, Ireland
Tel: w +353-61-213147  f +353-61-202569  h +353-61-338562;  Room F1-002 x 3147
mailto:[email protected]    ULSociology on Facebook: http://on.fb.me/fjIK9t
http://teaching.sociology.ul.ie/bhalpin/wordpress         twitter:@ULSociology
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/