How does Stata's sort algorithm work?
I was looking at the stability of the sort order. See the following 2
examples. But first note that I am not using the stable option in the
sort command.
Example 1 (changing sort order):
clear
set obs 100
g x = _n
g herbal = _n>25 & _n<=75
sort grade
l x in 1/10, clean noo
Example 2 (stable sort order):
clear
set obs 100
g x = _n
g herbal = _n<51
sort herbal
assert x ==_N+1-_n
Example 1 is what I've come to expect from experience. Each time it is
run the data is sorted differently.
The stability preservation of example 2 was surprising. Quoting from
the Stata manual entry on sort, "Without the stable option, the
ordering of observations with equal values of varlist is randomized."
I though that if the data was such that "herbal>=herbal[_n-1] if
_n>1", then a "sort herbal" command would not need to change the order
of the data, and thus the resulting sort order would not not vary with
multiple executions of the code. But that is not the case here. What
other conditions can lead to a (non-unique) sort command producing the
same dataset each time?
*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/