Notice: On March 31, it was **announced** that Statalist is moving from an email list to a **forum**. The old list will shut down on April 23, and its replacement, **statalist.org** is already up and running.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

From |
László Sándor <sandorl@gmail.com> |

To |
statalist@hsphsun2.harvard.edu |

Subject |
Re: st: re-sorting by the same variable or storing a permutation vector? |

Date |
Thu, 5 Apr 2012 09:42:26 -0400 |

Hi, Some clarification, if it helps, or just revives interest: I can easily save a sort order with "g order = _n" but I cannot simply reimpose it as an ordering with something like "_n <- order" or "replace _n = _n[order]". This is is what is said to be efficient and fast in Mata. Even if I had an "order" variable, I thought it's faster not to actually sort it, "only let Stata trust me" that these are natural numbers, which could be new observation numbers (indices). Maybe Stata would not trust users with this? But let me indulge on the use case and perhaps that can be speeded up some other way: I have a command that 1. generates bins of x by quantiles of x using -xtile- 2. generate means of x and y by these bins, which seem to be fast using -sum()- and indexing after sorting by xbins and `touse'. Surprisingly (to me), xtile is the bottleneck. But at least if I run this command using the same x but various cuts on the data (say, "if year == 1989", "if year == 1990" etc.), perhaps it could still be faster the next time if I could re-start it being already sorted by x but not "manually" sorting it (xtile would do it anway). This is what I hoped to use the permutation vector for. But perhaps when I sort by "`xbin' and `touse'" after originally having been sorted by x, perhaps this is the rare case worth using the "stable" option of sort? It will slow the second sort down (by "xbin and tosue" within the first run of my command), but when I reran the command, the next sort by x inherent in xtile should be faster, perhaps? At least if no other command my code uses jumbles up observations without my knowledge... Thanks again for any thoughts, Laszlo 2012/4/4 László Sándor <sandorl@gmail.com>: > Wise peers, > > Allow me another question, hopefully of general interest: If I would > need to resort to re-sorting (pun intended) my data multiple times, am > I not better off if I store a permutation vector (after the first > sort, obviously) in a variable and just reapply that? Is there a > faster way to do that than invoking Mata and putting my entire data > over there? (A monster st_view() if there ever was one.) I know Mata > allows fast sorting and permutation vectors (/ subscripting). Is that > the efficient way to go? > > http://stata.com/help.cgi?[M-5]+sort() > http://www.stata-journal.com/sjpdf.html?articlenum=pr0028 > > Thanks for any thoughts on this too, > > Laszlo > * > * For searches and help try: > * http://www.stata.com/help.cgi?search > * http://www.stata.com/support/statalist/faq > * http://www.ats.ucla.edu/stat/stata/ * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

**References**:**st: re-sorting by the same variable or storing a permutation vector?***From:*László Sándor <sandorl@gmail.com>

- Prev by Date:
**Re: st: Re:Grouping data/Probability Transition Matrix** - Next by Date:
**st: IV-oprobit using the cmp command** - Previous by thread:
**st: re-sorting by the same variable or storing a permutation vector?** - Next by thread:
**st: graph with bars inverted for values<1** - Index(es):