Bookmark and Share

Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down on April 23, and its replacement, statalist.org is already up and running.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

st: Failure to detect strings that look completely identical


From   Nicola Man <n.man@unsw.edu.au>
To   "statalist@hsphsun2.harvard.edu" <statalist@hsphsun2.harvard.edu>
Subject   st: Failure to detect strings that look completely identical
Date   Tue, 22 Nov 2011 02:46:50 +0000

Hi all,

I have a problem with matching strings in two string variables.

*Generate C1 and N1 to show problem with detecting the first character in the Country variable
. gen C1=substr(trim(Country))
. gen N1=substr(trim(Nation))
. list C1 N1 Country Nation if trim(Country)!=trim(Nation)

     +---------------------------------------------------------------------------+
     | C1   N1                           Country                          Nation |
     |---------------------------------------------------------------------------|
  1. |       A                       Afghanistan                     Afghanistan |
  2. |       A                           Albania                         Albania |
  3. |       A                           Algeria                         Algeria |
  4. |       A                           Andorra                         Andorra |
  5. |       A                            Angola                          Angola |
     |---------------------------------------------------------------------------|
  6. |       A               Antigua and Barbuda               Antigua & Barbuda |
  7. | etc..

The first five lines of observations for Country and Nation look identical to me, so I am not sure why Stata
 is not detecting this. The C1 variable tells me that the first character is not detected correctly even with the trim string function. Looking at it in another way, there were only four records identified as matching with the command below:

. lis C1 N1 Country Nation if trim(Country)==trim(Nation)

     +-----------------------------------------------+
     | C1   N1            Country             Nation |
     |-----------------------------------------------|
109. |  M    M   Marshall Islands   Marshall Islands |
121. |  N    N              Nauru              Nauru |
166. |  T    T             Taiwan             Taiwan |
176. |  T    T             Tuvalu             Tuvalu |
     +-----------------------------------------------+

I am currently using Stata 12 / SE and not exactly sure if this is to do with the character coding system it uses (is it only ASCII?).  If it is to do with the character coding, then I would appreciate advice or suggestions on the way around it.

Thanks,
Nicola
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   Site index