Stata: Data Analysis and Statistical Software

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: re-sorting by the same variable or storing a permutation vector?

From	László Sándor <[email protected]>
To	[email protected]
Subject	Re: st: re-sorting by the same variable or storing a permutation vector?
Date	Thu, 5 Apr 2012 09:42:26 -0400

Hi,

Some clarification, if it helps, or just revives interest:

I can easily save a sort order with "g order = _n" but I cannot simply
reimpose it as an ordering with something like "_n <- order" or
"replace _n = _n[order]". This is is what is said to be efficient and
fast in Mata.

Even if I had an "order" variable, I thought it's faster not to
actually sort it, "only let Stata trust me" that these are natural
numbers, which could be new observation numbers (indices). Maybe Stata
would not trust users with this?

But let me indulge on the use case and perhaps that can be speeded up
some other way:

I have a command that
1. generates bins of x by quantiles of x using -xtile-
2. generate means of x and y by these bins, which seem to be fast
using -sum()- and indexing after sorting by xbins and `touse'.

Surprisingly (to me), xtile is the bottleneck. But at least if I run
this command using the same x but various cuts on the data (say, "if
year == 1989", "if year == 1990" etc.), perhaps it could still be
faster the next time if I could re-start it being already sorted by x
but not "manually" sorting it (xtile would do it anway). This is what
I hoped to use the permutation vector for.

But perhaps when I sort by "`xbin' and `touse'" after originally
having been sorted by x, perhaps this is the rare case worth using the
"stable" option of sort? It will slow the second sort down (by "xbin
and tosue" within the first run of my command), but when I reran the
command, the next sort by x inherent in xtile should be faster,
perhaps?

At least if no other command my code uses jumbles up observations
without my knowledge...

Thanks again for any thoughts,

Laszlo

2012/4/4 László Sándor <[email protected]>:
> Wise peers,
>
> Allow me another question, hopefully of general interest: If I would
> need to resort to re-sorting (pun intended) my data multiple times, am
> I not better off if I store a permutation vector (after the first
> sort, obviously) in a variable and just reapply that? Is there a
> faster way to do that than invoking Mata and putting my entire data
> over there? (A monster st_view() if there ever was one.) I know Mata
> allows fast sorting and permutation vectors (/ subscripting). Is that
> the efficient way to go?
>
> http://stata.com/help.cgi?[M-5]+sort()
> http://www.stata-journal.com/sjpdf.html?articlenum=pr0028
>
> Thanks for any thoughts on this too,
>
> Laszlo
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/statalist/faq
> *   http://www.ats.ucla.edu/stat/stata/

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

References:
- st: re-sorting by the same variable or storing a permutation vector?
  - From: László Sándor <[email protected]>

Prev by Date: Re: st: Re:Grouping data/Probability Transition Matrix
Next by Date: st: IV-oprobit using the cmp command
Previous by thread: st: re-sorting by the same variable or storing a permutation vector?
Next by thread: st: graph with bars inverted for values<1
Index(es):
- Date
- Thread