Statalist The Stata Listserver


[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

Re: st: regular expressions: escape characters handled incorrectly


From   James Muller <james.muller@anu.edu.au>
To   statalist@hsphsun2.harvard.edu
Subject   Re: st: regular expressions: escape characters handled incorrectly
Date   Wed, 14 Feb 2007 20:50:47 +1100

Got an anonymous reply. 

[begin paraphrasing of anon reply]
  Problem was that the dash ("-") didn't need to be escaped. A
  (currently) valid Stata regexp (which does not match) is

      di regexm("(id1-2)", "\([-0-9]*\)")

  Additionally, oddly, matching a backslash (\) does not require
  escaping!  For example,

      di regexm("\\\", "[0\]")

  returns positive. 
[end paraphrasing of anon reply] 

The last example should return an error, as it should be that the [0\]
is escaping the last brace, and thus the opening brace should not
appear to have a partner before the closing quotes.

So although we can write an equivalent regular expression to a POSIX.2
regular expression in Stata, we must go through first and figure out
which characters need to and do not need to be escaped. Not good.

Finally, if Stata revises the regular expression functionality (which
is a fantastic addition to Stata, by the way), it should get the
behaviour of escaping sorted out. Particularly, if a backslash
appears, it is an escape, meaning that whatever follows is accepted
literally. Examples:

   \\  ==  "\" 
   \-  ==  "-"
   -   !=  "-" 
   \x  ==  "x" 
   \"  ==  """

Cheers

James



On Wed, Feb 14, 2007 at 05:48:34PM +1100, James Muller wrote:
> Found a problem with Stata's regular expression parser. It doesn't
> handle escapes (\'s) correctly. Was doing this while writing a regular
> expression to verify a valid varlist in a very non-standard syntax
> parser.
> 
>    di regexm("(id1-2)", "\([\-0-9]*\)")
> 
> The \-\0-9 in the [] should match numbers (0-9) and dash ("-"), but
> reads the escaped dash as another range operator. The specific error
> is
> 
>    regexp: invalid [] range
> 
> If anyone has a hack to obtain an equivalent regexp, or if Stata has
> any comments, gratitude in advance.
*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index