Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: bug in Stata's sorted-by flag


From   Sergiy Radyakin <[email protected]>
To   "[email protected]" <[email protected]>
Subject   Re: st: bug in Stata's sorted-by flag
Date   Thu, 15 Aug 2013 13:10:27 -0400

Dear Bill,

thank you very much for looking into this case and the intent to fix
it. I have found that not only -describe- will misreport the sorting
order, but a trivial way to check the sorted order would also not
work:

    assert `vn'>=`vn'[_n-1] if _n>1

which must hold for any dataset sorted in ascending order on `vn'

A program counting unique values would also fail (if it trusted the
`sortedby' that the data is sorted by `vn')
    count if `vn'!=`vn'[_n-1] & _n>1
    display r(N)+1

Also since the problem occurs with strings as well (see the first test
with the make variable), it is not only EXTENDED missing values.

However those few Stata commands that I tried to test on such an
inconsistent dataset indeed worked fine: -codebook-, -inspect- and
other candidates for a problem.

Finally, I have found that instead of the -set obs N- I can use
-expand M in L- (where M is the desired increment, and L is really
just an L). Since I am replacing the values anyway, I don't need to
rely on them to be missing to begin with. Interestingly enough, in
this case -expand- DOES reset the sortedby flag, and this is exactly
the case when it could leave it as is, since duplication of the
observations in the top would not distort the sorting order. What an
irony.

Best regards, Sergiy Radyakin




On Thu, Aug 15, 2013 at 11:30 AM, William Gould, StataCorp LP
<[email protected]> wrote:
> Sergiy Radyakin <[email protected]> reports,
>
>> it seems that under some conditions Stata 9.2-12.1 (Windows)
>> incorrectly reports that the dataset is sorted while in fact it is
>> not.
>
> Sergiy reports that this happens when
>
>         1.  The data are sorted by a variable or variables, say myvar.
>
>         2.  One or move observations of myvar contain EXTENDED missing
>             values (.a, .b, ..., .z).
>
>         3.  -set obs- is used to add extra observations to the end
>             of the dataset.
>
> The data are fine, but -describe- will report that the data are sorted
> by myvar, which is not true because, . < .a < .b < ... < .z.
>
> In most cases the bug has no implications beyond the mistaken
> -describe-, which is why it's gone undiscovered for 8+ years.
>
> We will fix it.
>
> In the meantime, the workaround is to -sort- the data after -set obs-.
> You must sort on an extraneous variable,
>
>            . set obs ...
>            . sort a
>            . sort myvar
>
> You might worry that, because the internal sort marker is incorrect,
> lots of other problems could arise.  In general, that would be true.
> In this case, however, such problems do not arise because all the
> misordering occurs within missing values.
>
>
> -- Bill
> [email protected]
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/faqs/resources/statalist-faq/
> *   http://www.ats.ucla.edu/stat/stata/
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index