Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

st: A faster way to gsort


From   Andrew Maurer <[email protected]>
To   Statalist Statalist <[email protected]>
Subject   st: A faster way to gsort
Date   Tue, 11 Mar 2014 15:49:59 +0000

Hi Statalist,

I've been finding that sorting is taking up a very significant portion of computation time in code that I'm writing (eg for 6hrs run time, 5.5hrs is spent sorting). I am looking for ways to do better than Stata's sort and gsort commands.

I just read about mata's use of permutation vectors (eg --help mata permutation--) and found that in at least the case below, they can be used for reverse sorting much faster than stata's --gsort--. gsort was 10x slower, at around 0.65 seconds vs the permutation method at around 0.065 seconds. (with 10^6 observations, times were ~7 seconds vs 1.4 seconds). I suppose there isn't a question here, but I'd be interested to hear if anyone has any thoughts.

************* Define Programs ***********
clear all

program revsort
	// caller program for mata revsort()
	tempvar id
	gen long `id' = _n
	mata : revsort("`id'")
end

mata
	void revsort(string scalar id)
	// "id" should be dataset's current sort index
	// revsort() just reverses the dataset's order
	{
		st_view(V=.,.,.)

		idpos = st_varindex(id)
		V[.,.] = V[revorder(V[.,idpos]),.]
	}
end
************* End Define Programs *******

************* Run Benchmark *************
/* 
benchmarking gsort vs "revsort" 
(mata permutation method using revorder())
*/
tempfile temp
local N = 10^5
set obs `N'

gen x = runiform()
save `temp'

forval i = 1/5 {
	use `temp', clear
	timer on `i'

	gsort -x

	timer off `i'
}


forval i = 11/15 {
	use `temp', clear
	timer on `i'

	sort x
	revsort

	timer off `i'
}

timer list
************* End Benchmark *************


*** Output ***
* 1) gsort
   1:      0.67 /        1 =       0.6700
   2:      0.67 /        1 =       0.6710
   3:      0.75 /        1 =       0.7490
   4:      0.66 /        1 =       0.6550
   5:      0.66 /        1 =       0.6550

* 2) revsort
  11:      0.06 /        1 =       0.0620
  12:      0.09 /        1 =       0.0940
  13:      0.06 /        1 =       0.0620
  14:      0.06 /        1 =       0.0620
  15:      0.06 /        1 =       0.0630
* End Output *

Thank you,

Andrew Maurer



*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index