Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: st: bug in Stata's sorted-by flag


From   Haluk Vahaboglu <[email protected]>
To   "[email protected]" <[email protected]>
Subject   RE: st: bug in Stata's sorted-by flag
Date   Thu, 15 Aug 2013 05:16:04 +0000

Sergiy thank you for this highly useful discussion, that warned me to be careful about what is really happening in the dataset might be different than expected. 
I run your program "do http://radyakin.org/statalist/2013081402/sortbug.do"; under Ubuntu 64-bit Stata 12.1 and checked what I see on the output screen against the "edit browser". It seems that in my conditions price is sorted correctly.
You mentioned in your message that the false behavior of -sort- command might be restricted with windows environment.
This is my question; is it possible, Stata may behave different according to the platform its working on?
Is this a relevant question or am I totally misunderstood the issue?
Thanks again

Haluk Vahaboğlu


> Date: Wed, 14 Aug 2013 20:50:31 -0400
> Subject: st: bug in Stata's sorted-by flag
> From: [email protected]
> To: [email protected]
> 
> Dear All,
> 
> it seems that under some conditions Stata 9.2-12.1 (Windows)
> incorrectly reports that the dataset is sorted while in fact it is
> not.
> 
> The following program demonstrates this:
> do http://radyakin.org/statalist/2013081402/sortbug.do
> 
> The problem seems that the Stata's built-in -set obs N- command is not
> clearing the sorted flag while changing the data.
> 
> 
> Here are some thoughts:
> 
> This does have important implications. In particular the sorted state
> is saved into a data file, and other (external) programs might rely on
> it being correct. Stata itself might get confused in some cases, when
> it inspects the sorted state, though I can't readily demonstrate it.
> 
> An example of such an inconsistent datafile produced by Stata is here
> (in v12 format):
> http://radyakin.org/statalist/2013081402/sortbug.dta
> or here (in v9 format):
> http://radyakin.org/statalist/2013081402/sortbug9.dta
> 
> A technical note in the following document:
> http://www.stata.com/manuals13/dsort.pdf
> explains that Stata is conservative and believes any chang to
> variables involved in the sort order is destroying the sort order.
> This means that sometimes one has to forgo a bit of performance to
> verify the sort order when it is not needed. And this is OK.
> 
> The converse is not good. Reporting that dataset as sorted when it is
> not causes serious implications as (at least some) user-written
> commands might be relying on the reported sort order to be credible.
> Stata's own commands would probably also get confused. I expect (but
> not checked) the -merge- command to behave erratically in this case,
> since I expect it relies on the saved sorted order for the 'using'
> datasets (secondary datasets).
> 
> The list of the variables, by which a dataset is sorted is contained
> in the macro sortedby as in:
> display `"`: sortedby'"'
> 
> This problem is found as partial explanation to what's happening with
> the sortpreserve option in my code, the discussion started in this
> thread:
> http://www.stata.com/statalist/archive/2013-08/msg00563.html
> and in which I am still interested. Even older discussions on the
> -sort-'s performance can be found in my "sorting data puzzles"
> postings here:
> http://www.stata.com/statalist/archive/2008-01/index.html#00810
> 
> Interestingly you would think that Stata itself should then refuse to
> sort the already sorted dataset. But no, it does re-sort it as can be
> seen here:
> ********************************************************************
> use http://radyakin.org/statalist/2013081402/sortbug.dta
> list
> describe
> sort price, stable
> list
> describe
> display c(changed)
> ********************************************************************
> 
> And given the problem, I am surprised to see how -collapse- continues
> to produce the correct results, but it seems to be working despite the
> dataset is not sorted.
> 
> Best, Sergiy Radyakin
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/faqs/resources/statalist-faq/
> *   http://www.ats.ucla.edu/stat/stata/ 		 	   		  

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index