Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: problem with regexm leading to "regexp: unterminated ()" error for all observations

From   Phil Schumm <>
Subject   Re: st: problem with regexm leading to "regexp: unterminated ()" error for all observations
Date   Fri, 3 Jun 2011 12:10:53 -0500

On Jun 3, 2011, at 7:35 AM, Jamie Fagg wrote:
I've a problem with the function -regexm-. I get the following message:

regexp: unterminated ()


#delimit ;

//regular expression to define whether postcode is syntactically correct

ge postcodevalid = 1 if regexm(postcode,"(GIR 0AA)|(((A[BL]| B[ABDHLNRSTX]
[0-9])|EC[1-9][0-9]) [0-9][ABD-HJLNP-UW-Z]{2})")==1;

I'm not sure why Stata chokes on this, though I would suspect it might have something to do with the length. As Nick and Steven have already noted, the repeat qualifier {n} is not supported by Stata's regular expression syntax, so you'll need to replace


with the equivalent


Now, Nick suggested breaking the expression up, so let's do that. Your expression is equal to


where the individual parts (as assigned to Stata macros) are

    loc p1    "GIR 0AA"
    loc p2a1d "[1-9]?[0-9]"
    loc p2a2  "((E|N|NW|SE|SW|W)1|EC[1-4]|WC[12])[A-HJKMNPR-Y]"
    loc p2a3  "(SW|W)([2-9]|[1-9][0-9])"
    loc p2a4  "EC[1-9][0-9]"
    loc p2b   " [0-9][ABD-HJLNP-UW-Z][ABD-HJLNP-UW-Z]"

This may then be easily broken up as follows:

    gen byte valid = regexm(postcode,"`p1'")
    replace valid = 1 if regexm(postcode,"`p2a1a'`p2a1d'`p2b'")
    replace valid = 1 if regexm(postcode,"`p2a1b'`p2a1d'`p2b'")
    replace valid = 1 if regexm(postcode,"`p2a1c'`p2a1d'`p2b'")
    replace valid = 1 if regexm(postcode,"(`p2a2'|`p2a3'|`p2a4')`p2b'")

-- Phil

*   For searches and help try:

© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index