Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
From | Haluk Vahaboglu <vahabo@hotmail.com> |
To | "statalist@hsphsun2.harvard.edu" <statalist@hsphsun2.harvard.edu> |
Subject | RE: st: bug in Stata's sorted-by flag |
Date | Thu, 15 Aug 2013 05:16:04 +0000 |
Sergiy thank you for this highly useful discussion, that warned me to be careful about what is really happening in the dataset might be different than expected. I run your program "do http://radyakin.org/statalist/2013081402/sortbug.do"; under Ubuntu 64-bit Stata 12.1 and checked what I see on the output screen against the "edit browser". It seems that in my conditions price is sorted correctly. You mentioned in your message that the false behavior of -sort- command might be restricted with windows environment. This is my question; is it possible, Stata may behave different according to the platform its working on? Is this a relevant question or am I totally misunderstood the issue? Thanks again Haluk Vahaboğlu > Date: Wed, 14 Aug 2013 20:50:31 -0400 > Subject: st: bug in Stata's sorted-by flag > From: serjradyakin@gmail.com > To: statalist@hsphsun2.harvard.edu > > Dear All, > > it seems that under some conditions Stata 9.2-12.1 (Windows) > incorrectly reports that the dataset is sorted while in fact it is > not. > > The following program demonstrates this: > do http://radyakin.org/statalist/2013081402/sortbug.do > > The problem seems that the Stata's built-in -set obs N- command is not > clearing the sorted flag while changing the data. > > > Here are some thoughts: > > This does have important implications. In particular the sorted state > is saved into a data file, and other (external) programs might rely on > it being correct. Stata itself might get confused in some cases, when > it inspects the sorted state, though I can't readily demonstrate it. > > An example of such an inconsistent datafile produced by Stata is here > (in v12 format): > http://radyakin.org/statalist/2013081402/sortbug.dta > or here (in v9 format): > http://radyakin.org/statalist/2013081402/sortbug9.dta > > A technical note in the following document: > http://www.stata.com/manuals13/dsort.pdf > explains that Stata is conservative and believes any chang to > variables involved in the sort order is destroying the sort order. > This means that sometimes one has to forgo a bit of performance to > verify the sort order when it is not needed. And this is OK. > > The converse is not good. Reporting that dataset as sorted when it is > not causes serious implications as (at least some) user-written > commands might be relying on the reported sort order to be credible. > Stata's own commands would probably also get confused. I expect (but > not checked) the -merge- command to behave erratically in this case, > since I expect it relies on the saved sorted order for the 'using' > datasets (secondary datasets). > > The list of the variables, by which a dataset is sorted is contained > in the macro sortedby as in: > display `"`: sortedby'"' > > This problem is found as partial explanation to what's happening with > the sortpreserve option in my code, the discussion started in this > thread: > http://www.stata.com/statalist/archive/2013-08/msg00563.html > and in which I am still interested. Even older discussions on the > -sort-'s performance can be found in my "sorting data puzzles" > postings here: > http://www.stata.com/statalist/archive/2008-01/index.html#00810 > > Interestingly you would think that Stata itself should then refuse to > sort the already sorted dataset. But no, it does re-sort it as can be > seen here: > ******************************************************************** > use http://radyakin.org/statalist/2013081402/sortbug.dta > list > describe > sort price, stable > list > describe > display c(changed) > ******************************************************************** > > And given the problem, I am surprised to see how -collapse- continues > to produce the correct results, but it seems to be working despite the > dataset is not sorted. > > Best, Sergiy Radyakin > * > * For searches and help try: > * http://www.stata.com/help.cgi?search > * http://www.stata.com/support/faqs/resources/statalist-faq/ > * http://www.ats.ucla.edu/stat/stata/ * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/faqs/resources/statalist-faq/ * http://www.ats.ucla.edu/stat/stata/