Bookmark and Share

Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down on April 23, and its replacement, statalist.org is already up and running.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: re-sorting by the same variable or storing a permutation vector?


From   László Sándor <sandorl@gmail.com>
To   statalist@hsphsun2.harvard.edu
Subject   Re: st: re-sorting by the same variable or storing a permutation vector?
Date   Thu, 5 Apr 2012 09:42:26 -0400

Hi,

Some clarification, if it helps, or just revives interest:

I can easily save a sort order with "g order = _n" but I cannot simply
reimpose it as an ordering with something like "_n <- order" or
"replace _n = _n[order]". This is is what is said to be efficient and
fast in Mata.

Even if I had an "order" variable, I thought it's faster not to
actually sort it, "only let Stata trust me" that these are natural
numbers, which could be new observation numbers (indices). Maybe Stata
would not trust users with this?

But let me indulge on the use case and perhaps that can be speeded up
some other way:

I have a command that
1. generates bins of x by quantiles of x using -xtile-
2. generate means of x and y by these bins, which seem to be fast
using -sum()- and indexing after sorting by xbins and `touse'.

Surprisingly (to me), xtile is the bottleneck. But at least if I run
this command using the same x but various cuts on the data (say, "if
year == 1989", "if year == 1990" etc.), perhaps it could still be
faster the next time if I could re-start it being already sorted by x
but not "manually" sorting it (xtile would do it anway). This is what
I hoped to use the permutation vector for.

But perhaps when I sort by "`xbin' and `touse'" after originally
having been sorted by x, perhaps this is the rare case worth using the
"stable" option of sort? It will slow the second sort down (by "xbin
and tosue" within the first run of my command), but when I reran the
command, the next sort by x inherent in xtile should be faster,
perhaps?

At least if no other command my code uses jumbles up observations
without my knowledge...

Thanks again for any thoughts,

Laszlo

2012/4/4 László Sándor <sandorl@gmail.com>:
> Wise peers,
>
> Allow me another question, hopefully of general interest: If I would
> need to resort to re-sorting (pun intended) my data multiple times, am
> I not better off if I store a permutation vector (after the first
> sort, obviously) in a variable and just reapply that? Is there a
> faster way to do that than invoking Mata and putting my entire data
> over there? (A monster st_view() if there ever was one.) I know Mata
> allows fast sorting and permutation vectors (/ subscripting). Is that
> the efficient way to go?
>
> http://stata.com/help.cgi?[M-5]+sort()
> http://www.stata-journal.com/sjpdf.html?articlenum=pr0028
>
> Thanks for any thoughts on this too,
>
> Laszlo
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/statalist/faq
> *   http://www.ats.ucla.edu/stat/stata/

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   Site index