[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]
Re: st: regular expressions: escape characters handled incorrectly
Got an anonymous reply.
[begin paraphrasing of anon reply]
Problem was that the dash ("-") didn't need to be escaped. A
(currently) valid Stata regexp (which does not match) is
di regexm("(id1-2)", "\([-0-9]*\)")
Additionally, oddly, matching a backslash (\) does not require
escaping! For example,
di regexm("\\\", "[0\]")
[end paraphrasing of anon reply]
The last example should return an error, as it should be that the [0\]
is escaping the last brace, and thus the opening brace should not
appear to have a partner before the closing quotes.
So although we can write an equivalent regular expression to a POSIX.2
regular expression in Stata, we must go through first and figure out
which characters need to and do not need to be escaped. Not good.
Finally, if Stata revises the regular expression functionality (which
is a fantastic addition to Stata, by the way), it should get the
behaviour of escaping sorted out. Particularly, if a backslash
appears, it is an escape, meaning that whatever follows is accepted
\\ == "\"
\- == "-"
- != "-"
\x == "x"
\" == """
On Wed, Feb 14, 2007 at 05:48:34PM +1100, James Muller wrote:
> Found a problem with Stata's regular expression parser. It doesn't
> handle escapes (\'s) correctly. Was doing this while writing a regular
> expression to verify a valid varlist in a very non-standard syntax
> di regexm("(id1-2)", "\([\-0-9]*\)")
> The \-\0-9 in the  should match numbers (0-9) and dash ("-"), but
> reads the escaped dash as another range operator. The specific error
> regexp: invalid  range
> If anyone has a hack to obtain an equivalent regexp, or if Stata has
> any comments, gratitude in advance.
* For searches and help try: