[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

Get this: x1 x2 x3 1 2 3 3 4 5 7 9 9 For all I know, there is no linear complexity O(n) sorting algorithm(http://en.wikipedia.org/wiki/Sorting_algorithm) Here you can see how different algorithms actually perform: http://linux.wku.edu/~lamonml/algor/sort/sort.html For N observations and K variables, Nick suggested reshaping data to get NxK observations, sort them (perhaps on a complex key) and reshape it back. I would doubt that this strategy improves performance, because of the above considerations of nonlinearity of sorting with respect to N: "sorting K values N times" is much simplier than "sorting NxK values". Just how much easier - it depends. When K is rather small, the gain can be quite big. Consider e.g. K=2 :) -- a simple conditional swap -- linear! (number of swaps in the worst case = N, number of comparisons: also N). Reshape itself works quite slow. Like really slow. Because it is implemented as an .ado program. A commentary by the author of rowsort notes that "It may be faster to reshape, sort within blocks, and reshape again". So I take that as a sign of even worse performance. Another issue to consider is memory requirements. How much memory does reshape require? (we are all so concerned about fitting our data into 2GB/1GB/700MB/... limit, but often forget that once it is there, working with it requires memory too!) In my experience reshape requires at least two times as much: . memory bytes -------------------------------------------------------------------- Details of set memory usage overhead (pointers) 14,800 14.45% data 37,000 36.14% ---------------------------- data + overhead 51,800 50.59% free 50,592 49.41% ---------------------------- Total allocated 102,392 100.00% -------------------------------------------------------------------- Other memory usage set maxvar usage 2,041,738 set matsize usage 1,315,200 programs, saved results, etc. 45,523 --------------- Total 3,402,461 ------------------------------------------------------- Grand total 3,504,853 . reshape long r p, i(id) (note: j = 1 2 3 4 5 6) no room to add more observations r(901) (sorry for mentioning this taboo topic on public, have you ever estimated how much memory does an XYZ command require? Even Stata people guesstimate here). So Ashim, don't get discouraged and write your own code. It might work better for your particular case. One last advice is that sometimes the problem can be resolved by a radically different solution. E.g. if your variables are, say age of children for a particular mother in a [e.g. DHS] survey, you might find a dataset already in the long-format and start working with it. The dataset description will let you know what is the ordering there. In particular, data may be ordered with extra (not available to you) information which may be lost during sorting. E.g.: the codebook may say that children are ordered by age: 6, 8, 8, 12. You might not know whether 2nd and 3rd children are twins, or were born in the same year, but the data providers might very well do! Changing their places (remember that Stata does not guarantee sort stability for observations, unless special option is used, and I am not sure how this works for sorting rows) will change their birth ranks - quite an important characteristic, if we believe a large body of literature on the topic. Hope this helps. Anyways, I don't immediately see how -sortrows- or -rowssort- will deal with this situatuion: motherid child_id1 child_id2 child_id3 age1 age2 age3 gender1 gender2 gender3 ..... and I might want to sort children by age keeping their ordering stable and moving their (string) ids and (numeric) gender dummies in sync. Regards, Sergiy Radyakin On 8/26/08, Nick Cox <n.j.cox@durham.ac.uk> wrote: > This generated various suggestions. > > My main thought is that you should never have to write your own sort > programs, bubble sort or other, in Stata. > > Even sorting rows can make use of Stata's own sort functionality -- as > in a combination of -reshape- and -sort-, as mentioned by Maarten Buis > -- or as in -rowsort or -sortrows-, as mentioned in part by Martin > Weiss. > > -rowsort- is on SSC and was written for Stata 8. > > -sortrows- is also on SSC and was written for Stata 9. It makes use of > Mata and in large part supersedes -rowsort-, so long as you have Stata 9 > at least. > > Nick > n.j.cox@durham.ac.uk > > Ashim Kapoor > > I have a simple code to write. > > I have 6 variables v1, v2 , v3, v4 ,v5 , v6. > > I want to sort "according to each observation". That is , in the end > for each i I want , v1[i] < = v2[i] <=... v6[i]. > > So I wrote a simple bubble for each row and then loop from 1 to _N. > > Now the problem is that this works but it's VERY VERY slow. Any > comments anyone ? > > > * > * For searches and help try: > * http://www.stata.com/help.cgi?search > * http://www.stata.com/support/statalist/faq > * http://www.ats.ucla.edu/stat/stata/ > * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

- Prev by Date:
**Re: st: RE: A query about sorting.** - Next by Date:
**st: serial autocorrelation in residuals, what to do?** - Previous by thread:
**st: -spmap- reason for no aspect() option?** - Next by thread:
**st: Exporting regresion point estimates and standard deviations** - Index(es):

© Copyright 1996–2014 StataCorp LP | Terms of use | Privacy | Contact us | What's new | Site index |