Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
From | Nick Cox <n.j.cox@durham.ac.uk> |
To | "'statalist@hsphsun2.harvard.edu'" <statalist@hsphsun2.harvard.edu> |
Subject | RE: st: Failure to detect strings that look completely identical |
Date | Wed, 23 Nov 2011 17:10:20 +0000 |
Quite so. "are hard" should be "may be hard". For large #, char(#) can vary considerably with operating system. Nick n.j.cox@durham.ac.uk -----Original Message----- From: owner-statalist@hsphsun2.harvard.edu [mailto:owner-statalist@hsphsun2.harvard.edu] On Behalf Of Ronan Conroy Sent: 23 November 2011 14:01 To: statalist@hsphsun2.harvard.edu Subject: Re: st: Failure to detect strings that look completely identical On 2011 Samh 22, at 19:50, Nick Cox wrote: > The help for -charlist- (SSC) documents that char(32) and char(160) > are hard to tell apart: > > . di "|`=char(32)'|" > | | > > . di "|`=char(160)'|" > | | > > So, watch out for char(160). Your mileage may vary Mac OS X . di "|`=char(160)'|" |†| . di "|`=char(32)'|" | | I got caught out, years ago, by data contaminated by ASCII 30 - the infamous null character. It was used by MS Word to indicate end of file, and could sneak into data. . di "|`=char(30)'|" || However, if I paste this output into BBEdit and view invisibles, I can see the little horror, which BBEdit displays as a red ¿. (If your mailer hasn't shown you a Spanish inverted question mark, well, that's mailers for you.) . di "|`=char(30)'|" |¿| Null is particularly nasty because it has no width, so it's very hard to spot. * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/