Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

st: RE: Stata treatment of sort order


From   "Sarah Edgington" <[email protected]>
To   <[email protected]>
Subject   st: RE: Stata treatment of sort order
Date   Thu, 6 Mar 2014 12:14:04 -0800

Andrew,
In the example in your second question you're asking Stata to sort the data
on a variable on which it is already sorted.  In that case I would not
expect Stata to change the ordering of the data at all, with or without the
stable option.  Even though you're pasting in new data (so Stata has no
knowledge of the existing sort order) I would expect that the sorting
algorithm would do some checking of whether the data was already in the
order you requested.  Since it is already sorted in that order, I wouldn't
expect the data to be changed.  Admittedly that's just a guess since I don't
have any information on how Stata implements sorting, but it would explain
the behavior.  

However, you can see that if the data is NOT already sorted on the variable
of interest that the sort order does change over multiple sorts.  For
example, using the auto data, try to -sort price- then -sort foreign-.  If
you do this multiple times you'll note that the ordering is different after
-sort foreign-.

-Sarah


-----Original Message-----
From: [email protected]
[mailto:[email protected]] On Behalf Of Andrew Maurer
Sent: Thursday, March 06, 2014 11:36 AM
To: [email protected]
Subject: st: Stata treatment of sort order

Hi Statalist,

I'm wondering if anyone can help explain some details about Stata and
sorting

First, where does Stata hold information about current sort order? Ie, the
extended macro function --`: sortedby'-- returns the current sort order.
However, looking at --char dir-- and --macro dir-- I don't see the
information there. In particular, I want to overwrite the value, so that
--`: sortedby'-- will return the value that I insert. One use might be if I
-infile-, and I already know the sort order of the data, but don't want to
have to run sort just to populate `: sortedby'. (In --help dta--, I see
where it's stored in a physical dta file [<sortlist>sortlist</sortlist>],
but it doesn't explain where it is put in memory.

Second, the help file for sort seems somewhat misleading. --help sort--
explains, "Without the stable option, the ordering of observations with
equal values of varlist is randomized." What does "randomized" here mean? I
interpret it to mean that each residual observation has an equal probability
of being in any of the slots specified by the sort list (eg that --sort
var1-- is equivalent to --gen rand = runiform()-- --sort var1 rand-- --drop
rand-- However, residual sort order doesn't always appear random. For
example, if I --sysuse auto--, --sort foreign--, then copy the data to
clipboard, --clear--, then use data editor to paste the data back, and
finally --sort foreign--, the ordering is always the same as the original
ordering (ie: the ordering of observations with equal values of varlist was
/not/ randomized.

Is anyone able to explain these observations?

Thank you,

Andrew Maurer 



*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index