Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: st: RE: Stata treatment of sort order


From   Kieran McCaul <[email protected]>
To   "[email protected]" <[email protected]>
Subject   RE: st: RE: Stata treatment of sort order
Date   Fri, 7 Mar 2014 06:33:37 +0800

...

I don't know how Stata's sort algorithm works, but the way you've constructed the dataset has a feature that might lead the algorithm to the same sort order every time:  x=0 is always in the last record, so if the order of the data is simply reversed then it is sorted by x.
 
If I modify your code so that x=0 is in the middle of the data, I get two different lists.

****** Begin code ******
clear all
set obs 10
gen id = _n
gen x = 1
replace x = 0 in 6
sort x
list, clean
****** End code ********

After first run:
       id   x  
  1.    6   0  
  2.    5   1  
  3.    4   1  
  4.    8   1  
  5.    7   1  
  6.    2   1  
  7.    3   1  
  8.    1   1  
  9.    9   1  
 10.   10   1  

After second run:
       id   x  
       id   x  
  1.    6   0  
  2.    2   1  
  3.    7   1  
  4.    1   1  
  5.    5   1  
  6.    9   1  
  7.    8   1  
  8.    3   1  
  9.    4   1  
 10.   10   1  


-----Original Message-----
From: [email protected] [mailto:[email protected]] On Behalf Of Andrew Maurer
Sent: Friday, 7 March 2014 5:01 AM
To: [email protected]
Subject: RE: st: RE: Stata treatment of sort order

Rich,
Thanks for this reference. This is interesting, since I don't know how Stata could sort datasets without the "`: sortedby'" flag "instantly". Wouldn't the sort on an already sorted set take at least O(n)? (ie: doesn't the program need to loop once and verify that x[i] <= x[i+1] for i from 1 to _N-1?)

Sarah,
Thanks for the response. However, here's an example of an unsorted list, with repeated values of the sort variable, where the final sort order is always the same after --sort x--. This seems like it contradicts the documentation's assertion that, "the ordering of observations with equal values of varlist is randomized". Perhaps "sometimes randomized" would be more appropriate.

****** Begin code ******
clear all
set obs 10
gen id = _n
gen x = 1 in 1/9
replace x = 0 in 10
sort x
****** End code ********

Output is always:
id	x
10	0
9	1
8	1
7	1
6	1
5	1
4	1
3	1
2	1
1	1


Andrew Maurer 

-----Original Message-----
From: [email protected] [mailto:[email protected]] On Behalf Of Richard Goldstein
Sent: Thursday, March 06, 2014 2:20 PM
To: [email protected]
Subject: Re: st: RE: Stata treatment of sort order

actually, the manual specifically deals with this: "Stata may be dumb, but it is also fast. It sorts already-sorted datasets instantly, so Stata's ignorance costs us little." p. 603

Rich

On 3/6/14, 3:14 PM, Sarah Edgington wrote:
> Andrew,
> In the example in your second question you're asking Stata to sort the 
> data on a variable on which it is already sorted.  In that case I 
> would not expect Stata to change the ordering of the data at all, with 
> or without the stable option.  Even though you're pasting in new data 
> (so Stata has no knowledge of the existing sort order) I would expect 
> that the sorting algorithm would do some checking of whether the data 
> was already in the order you requested.  Since it is already sorted in 
> that order, I wouldn't expect the data to be changed.  Admittedly 
> that's just a guess since I don't have any information on how Stata 
> implements sorting, but it would explain the behavior.
> 
> However, you can see that if the data is NOT already sorted on the 
> variable of interest that the sort order does change over multiple 
> sorts.  For example, using the auto data, try to -sort price- then 
> -sort foreign-.  If you do this multiple times you'll note that the 
> ordering is different after -sort foreign-.
> 
> -Sarah
> 
> 
> -----Original Message-----
> From: [email protected]
> [mailto:[email protected]] On Behalf Of Andrew 
> Maurer
> Sent: Thursday, March 06, 2014 11:36 AM
> To: [email protected]
> Subject: st: Stata treatment of sort order
> 
> Hi Statalist,
> 
> I'm wondering if anyone can help explain some details about Stata and 
> sorting
> 
> First, where does Stata hold information about current sort order? Ie, 
> the extended macro function --`: sortedby'-- returns the current sort order.
> However, looking at --char dir-- and --macro dir-- I don't see the 
> information there. In particular, I want to overwrite the value, so 
> that
> --`: sortedby'-- will return the value that I insert. One use might be 
> if I -infile-, and I already know the sort order of the data, but 
> don't want to have to run sort just to populate `: sortedby'. (In 
> --help dta--, I see where it's stored in a physical dta file 
> [<sortlist>sortlist</sortlist>], but it doesn't explain where it is put in memory.
> 
> Second, the help file for sort seems somewhat misleading. --help 
> sort-- explains, "Without the stable option, the ordering of 
> observations with equal values of varlist is randomized." What does 
> "randomized" here mean? I interpret it to mean that each residual 
> observation has an equal probability of being in any of the slots 
> specified by the sort list (eg that --sort
> var1-- is equivalent to --gen rand = runiform()-- --sort var1 rand-- 
> --drop
> rand-- However, residual sort order doesn't always appear random. For 
> example, if I --sysuse auto--, --sort foreign--, then copy the data to 
> clipboard, --clear--, then use data editor to paste the data back, and 
> finally --sort foreign--, the ordering is always the same as the 
> original ordering (ie: the ordering of observations with equal values 
> of varlist was /not/ randomized.
> 
> Is anyone able to explain these observations?
> 
> Thank you,
> 
> Andrew Maurer
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/



*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index