Statalist


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

st: RE: RE: RE: RE: RE: Strings and the greater than/less than operators


From   "sdm1" <sdm1@york.ac.uk>
To   <statalist@hsphsun2.harvard.edu>
Subject   st: RE: RE: RE: RE: RE: Strings and the greater than/less than operators
Date   Thu, 14 May 2009 15:15:00 +0100

Sorry to trouble you again on this topic. 
 
I was thinking about sorting strings in the same way as one would sort
numerics so that, for example, a four digit integer is always greater than a
three digit integer.  Clearly I shouldn't think like this when it comes to
strings because "N12 " is less than "N13" and, although 1234 is greater than
124, "1234" is less than "124".  When thinking about the sort order of
strings, would it be sensible to think of them being left aligned (whereas
with integers they're right aligned)?
 
Thanks.

Steve

-----Original Message-----
From: owner-statalist@hsphsun2.harvard.edu
[mailto:owner-statalist@hsphsun2.harvard.edu] On Behalf Of sdm1
Sent: 13 May 2009 19:24
To: statalist@hsphsun2.harvard.edu
Subject: st: RE: RE: RE: RE: Strings and the greater than/less than
operators

Nick/Gary,

Thanks very much for your help.  -asciiplot- provides a particularly useful
display of the sort 'order' of characters.

Cheers!

Steve

-----Original Message-----
From: owner-statalist@hsphsun2.harvard.edu
[mailto:owner-statalist@hsphsun2.harvard.edu] On Behalf Of Nick Cox
Sent: 13 May 2009 18:14
To: statalist@hsphsun2.harvard.edu
Subject: st: RE: RE: RE: Strings and the greater than/less than operators

I wrote -char()-, not -char-. The () signal a function, -char()-. For
example, 

. di char(65)
A

. di char(97)
a

Referring to -char()- is more precise than referring to (say) ASCII order,
which doesn't mean the same thing in absolutely all circumstances. 

Stata doesn't offer an inverse to -char()-, but -asciiplot- from SSC gives
you a graphical display of the characters on your system. In any case,
typing e.g.  

di ("a" > "A") 

gives you 1 for true and 0 for false. 

Incidentally, these data look like first parts of UK postcodes. Right or
wrong, you might use -trim()- to lose the trailing spaces now in order not
to be bitten again. 

Nick
n.j.cox@durham.ac.uk 

sdm1

Noooooooooooo!  

Thanks Nick...and, of course, you're dead right.  

The giveaway, I realise now, is the alignment of the values of code under
the heading 'code' in the tabulation.  I think that the last character
aligns vertically with the 'e' of 'code'.

The only bit I don't understand is: "The order is that of -char()-".  

It sounds to me as if char is user defined.  This is from the help for
char:

 The dataset itself and each variable within the dataset have associated
with them a set of characteristics.
    Characteristics are named and referred to as varname[charname], where
varname is the name of a variable or _dta.
    The characteristics contain text.  Characteristics are stored with the
dataset in the Stata-format .dta dataset,
    so they are recalled whenever the dataset is loaded.

If characteristics for a variable are not defined by the user, what's the
default order?  Is there a list somewhere which will tell me the order in
which Stata sorts characters e.g. alphabetric, numeric, spaces, etc.  Or am
I misinterpreting here?

Once again, thanks for your help.

Nick Cox

General question: Absolutely. The order is that of -char()-. 

Specific question: "N05 " > "N05". You have trailing spaces. They are
characters too. 

Nick
n.j.cox@durham.ac.uk 

Steve 

Can the greater than (>) and less than (<) operators be applied to strings?

If the answer is 'yes' (as I thought), why is "N05" included in the output
for the following command? 

. tab code admimeth if (admimeth==31 | admimeth==32) & (code>"N05" &
code<"N13")

           |       admimeth
      code |        31         32 |     Total
-----------+----------------------+----------
      N05  |       103        163 |       266 
      N06  |    23,858        132 |    23,990 
      N07  |   364,687      2,653 |   367,340 
      N08  |     8,079         18 |     8,097 
      N09  |    70,953        132 |    71,085 
      N10  |    24,606         88 |    24,694 
      N11  |   123,635        256 |   123,891 
      N12  |   546,148     21,998 |   568,146 
-----------+----------------------+----------
     Total | 1,162,069     25,440 | 1,187,509 


*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index