Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: RE: Stata treatment of sort order


From   Richard Goldstein <[email protected]>
To   [email protected]
Subject   Re: st: RE: Stata treatment of sort order
Date   Thu, 06 Mar 2014 15:20:18 -0500

actually, the manual specifically deals with this: "Stata may be
dumb, but it is also fast. It sorts already-sorted datasets instantly,
so Stata’s ignorance costs us little." p. 603

Rich

On 3/6/14, 3:14 PM, Sarah Edgington wrote:
> Andrew,
> In the example in your second question you're asking Stata to sort the data
> on a variable on which it is already sorted.  In that case I would not
> expect Stata to change the ordering of the data at all, with or without the
> stable option.  Even though you're pasting in new data (so Stata has no
> knowledge of the existing sort order) I would expect that the sorting
> algorithm would do some checking of whether the data was already in the
> order you requested.  Since it is already sorted in that order, I wouldn't
> expect the data to be changed.  Admittedly that's just a guess since I don't
> have any information on how Stata implements sorting, but it would explain
> the behavior.  
> 
> However, you can see that if the data is NOT already sorted on the variable
> of interest that the sort order does change over multiple sorts.  For
> example, using the auto data, try to -sort price- then -sort foreign-.  If
> you do this multiple times you'll note that the ordering is different after
> -sort foreign-.
> 
> -Sarah
> 
> 
> -----Original Message-----
> From: [email protected]
> [mailto:[email protected]] On Behalf Of Andrew Maurer
> Sent: Thursday, March 06, 2014 11:36 AM
> To: [email protected]
> Subject: st: Stata treatment of sort order
> 
> Hi Statalist,
> 
> I'm wondering if anyone can help explain some details about Stata and
> sorting
> 
> First, where does Stata hold information about current sort order? Ie, the
> extended macro function --`: sortedby'-- returns the current sort order.
> However, looking at --char dir-- and --macro dir-- I don't see the
> information there. In particular, I want to overwrite the value, so that
> --`: sortedby'-- will return the value that I insert. One use might be if I
> -infile-, and I already know the sort order of the data, but don't want to
> have to run sort just to populate `: sortedby'. (In --help dta--, I see
> where it's stored in a physical dta file [<sortlist>sortlist</sortlist>],
> but it doesn't explain where it is put in memory.
> 
> Second, the help file for sort seems somewhat misleading. --help sort--
> explains, "Without the stable option, the ordering of observations with
> equal values of varlist is randomized." What does "randomized" here mean? I
> interpret it to mean that each residual observation has an equal probability
> of being in any of the slots specified by the sort list (eg that --sort
> var1-- is equivalent to --gen rand = runiform()-- --sort var1 rand-- --drop
> rand-- However, residual sort order doesn't always appear random. For
> example, if I --sysuse auto--, --sort foreign--, then copy the data to
> clipboard, --clear--, then use data editor to paste the data back, and
> finally --sort foreign--, the ordering is always the same as the original
> ordering (ie: the ordering of observations with equal values of varlist was
> /not/ randomized.
> 
> Is anyone able to explain these observations?
> 
> Thank you,
> 
> Andrew Maurer 
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index