Statalist The Stata Listserver


[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

Re: st: sort order of empty strings in Mata


From   wgould@stata.com (William Gould, Stata)
To   statalist@hsphsun2.harvard.edu
Subject   Re: st: sort order of empty strings in Mata
Date   Tue, 09 May 2006 08:28:26 -0500

Phil Schumm <pschumm@uchicago.edu> asks, 

> In Mata, empty strings sort to the bottom rather than to the top (as  
> they do in Stata):
>
> . di (""<"a")
>   1
> 
> . mata: ""<"a"
>   0
> Is anyone aware of the reason for this (I couldn't seem to find it in  
> the documentation)?

I've put in on the bug list; we will fix Mata so that ""<"a".

In my defense, let me tell you how it happened.  You are probabliy wondering,
why do Stata and Mata use different internal subroutines to compare strings?

The answer is that strings in Stata and Mata are different.  Strings in Stata
are text strings.  Strings in Mata are generalized to be any sequence of
bytes, and hence may be text strings and binary strings.

By convention in C, text strings end in binary 0.  The string "a" is 
really a two-character string "a\0" or, in hex, 6100.  The string 
"" is really "\0" or, in hex, 00.  Comparisons are made by accessing 
the bytes sequentially and comparing them numerically.  Hence, 
00 < 6100.  

In Mata, binary 0 in a string has no special meaning -- it is just one of the
256 possible values a byte may contain:  00, 01, 02, ..., ff.  In Mata, "a" is
"a" (61 in hex) and "" is <nothing>.  Comparisons are made the same way as in
Stata, in fact, by accessing the bytes sequentially and comparing them
numerically.  Hence, oops -- <nothing>, is it less than, equal to, or greater
than 61?  Well, it all depends on how we wrote the code, and we wrote it
incorrectly if we want to be consistent with Stata, which we do.

Phil demonstrated, 

        . mata: ""<"a"
          0

Watch this:

	. mata:  ("" + char(0)) < ("a" + char(0))
          1

That's Mata recreating what it is that Stata literally does.

Anyway, what actually happened is that we did not use a different comparison
routine for strings in Stata and Mata.  We unthinkinglyh used the same one,
but because we generalized strings in Mata, we needed a special one.

-- Bill
wgould@stata.com
*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index