[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

Re: st: Ignore accents while sorting international characters

From   "Austin Nichols" <>
Subject   Re: st: Ignore accents while sorting international characters
Date   Thu, 19 Jun 2008 11:37:05 -0400

John LeBlanc <> et al.:
I would make a stronger statement than John-Paul Ferguson--it's
probably impossible to do for the general case, as different fonts can
map characters that are a bit like another modulo a diacritical mark
to different codes.  If you can specify the mapping you want (between
characters and numeric codes) you can write a gsort2.ado that will
sort as you want, but you can also just generate a new variable that
will sort as you want, which is what a gsort2.ado would do, so there
is little to be gained.  If you want to see how Stata will sort your
string, type:

forv i=32/255 {
 di char(`i') _c

and note that capital letters get sorted before lower-case, which come
before all characters with diacritical marks. So you can predict how
this will come out:

input str2 a
sort a

Also note different folks might want different orderings, even if
numeric codes were perfectly stable, e.g. consider ÷ in Swedish or

On Wed, Jun 18, 2008 at 10:13 PM, John-Paul Ferguson <> wrote:
> Looking at the source for gsort reveals that it's mostly engaged in macro
> manipulation with an occasional call to sort to do the basic work. Since
> sort
> itself is a built-in command, it would almost HAVE to be Stata that made any
> such modification.
> John-Paul Ferguson
> Quoting John LeBlanc <>:
>> Thanks; I was hoping that Stata had a built-in option to ignore accents.
>> Some software with sort routines have the ability to give characters with
>> diacritical marks the same value as their own. Is this not an issue for
>> non-English Stata users? Is there sufficient desire to justify asking stata
>> for this feature, e.g., as an option to gsort?
>> John
>> On Wed, 18 Jun 2008 12:53:14 +0200, Svend Juul wrote:
>> John LeBlanc wrote:
>> How does one ignore accents while sorting international characters?
>> sort & gsort deliver this:
>> ecole
>> school
>> Úcole
>> What I'd like is this:
>> ecole
>> Úcole
>> school
>> ============================================================
>> I believe that you must generate a second variable with no accents
>> to get it right:
>>   gen str10 key2=key
>>   replace key2 = subinstr(key2,"Ú","e",.)
>>   replace key2 = subinstr(key2,"˘","o",.)
>>   ...
>>   sort key2 key
>> I included key as a secondary sort key to make Ú come after e.
>> Hope this helps
>> Svend

*   For searches and help try:

© Copyright 1996–2015 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index