Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

# Re: st: Comparing strings

 From Nick Cox To statalist@hsphsun2.harvard.edu Subject Re: st: Comparing strings Date Mon, 26 Mar 2012 01:51:49 +0100

```-indexnot()- is a function, not a command.

It's not clear to me what you want, but you can check for whether the
same letters occur in two strings, at the cost of some programming.
For example, a Mata function can be written to sort the characters of
a string scalar into alphabetical order. Here is one:

mata :

string scalar deorst(string scalar mystring) {

real scalar len
string vector work
len = strlen(mystring)
work = J(len, 1, "")
for(i = 1; i <= len; i++) work[i] = substr(mystring, i, 1)
_sort(work, 1)
mystring = ""
for(i = 1; i <= len; i++) mystring = mystring + work[i]
return(mystring)
}

end

. mata : deorst("sorted")
deorst

. mata : deorst("backwards")
aabcdkrsw

On Sun, Mar 25, 2012 at 10:20 PM, jo la frite <jo_la_frite@yahoo.com> wrote:
> thanks Nick and Eric. As far as I understand, the indexnot command compares strings regardless of the ordering of the characters in the string. for example, "frog" and "ogfr" are viewed as identical by indexnot.
>
>
> Is there a way of controling for the ordering of the characters. So for example, "comparing "frog" and "fragro" retuns 3 (position of the first character from "frog" not in "fragro").

From: Nick Cox <njcoxstata@gmail.com>

> Stata naturally does have a concept of alphanumeric order for strings;
> otherwise it could not -sort- them. Consider
>
> . di ("frog" < "toad")
> 1
>
> . di ("frog" < "foo")
> 0
>
> The first statement is true and the second false. Otherwise put, with
> strings < means "precedes" and > means "follows" in alphanumeric
> order.
>
> This allows one step further forwards:
>
> gen compare = cond(str1 > str2, indexnot(str1, str2), -indexnot(str1, str2))
>
> If strings are identical, this yields 0. Jo did not make explicit that
> this is what SAS does too, but either way it seems logical to me.
>
> Nick
>
> On Sat, Mar 24, 2012 at 10:47 PM, Eric Booth <eric.a.booth@gmail.com> wrote:
>
>> Take a look at the string function (-help string_functions-) indexnot() (e.g., "gen x = indexnot(string1, string2)" )  which will give you the leftmost position where the two strings differ.
>> This Stata string function does not assign the positive/negative sign like the sas function you describe, but you can code those yourself by using other string functions to find how they differ in order/sequence/length.
>
> On Mar 24, 2012, at 5:12 PM, jo la frite wrote:
>
>>> Is there a Stata function that correspondons to the Sas function "COMPARE". It allows to compare strings. Specifically, in Sas COMPARE(string-1, string-2) returns a numeric value. The sign of the result is negative if string-1 precedes string-2 in a sort sequence, and positive if string-1 follows string-2 in a sort sequence. The magnitude of the result is equal to the position of the leftmost character at which the strings differ.

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/
```