Bookmark and Share

Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down on April 23, and its replacement, is already up and running.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: problem with regexm leading to "regexp: unterminated ()" error for all observations

From   Jamie Fagg <>
Subject   Re: st: problem with regexm leading to "regexp: unterminated ()" error for all observations
Date   Mon, 06 Jun 2011 15:11:55 +0100

Dear Phil Schumm, Nick Cox and Steve Samuels,

Many thanks for all your help on this.

Steve and Nick - thanks for the initial advice on what was causing the error.

Phil - I had just finished breaking it down when I saw your message. It is a much more elegant solution than the one I came up with after Nick recommended breaking it down, so thanks.

Best wishes,


On 03/06/2011 18:10, Phil Schumm wrote:
On Jun 3, 2011, at 7:35 AM, Jamie Fagg wrote:
I've a problem with the function -regexm-. I get the following message:

regexp: unterminated ()


#delimit ;

//regular expression to define whether postcode is syntactically correct

ge postcodevalid = 1 if regexm(postcode,"(GIR 0AA)|(((A[BL]|B[ABDHLNRSTX]
[0-9])|EC[1-9][0-9]) [0-9][ABD-HJLNP-UW-Z]{2})")==1;

I'm not sure why Stata chokes on this, though I would suspect it might have something to do with the length. As Nick and Steven have already noted, the repeat qualifier {n} is not supported by Stata's regular expression syntax, so you'll need to replace


with the equivalent


Now, Nick suggested breaking the expression up, so let's do that. Your expression is equal to


where the individual parts (as assigned to Stata macros) are

    loc p1    "GIR 0AA"
    loc p2a1d "[1-9]?[0-9]"
    loc p2a2  "((E|N|NW|SE|SW|W)1|EC[1-4]|WC[12])[A-HJKMNPR-Y]"
    loc p2a3  "(SW|W)([2-9]|[1-9][0-9])"
    loc p2a4  "EC[1-9][0-9]"
    loc p2b   " [0-9][ABD-HJLNP-UW-Z][ABD-HJLNP-UW-Z]"

This may then be easily broken up as follows:

    gen byte valid = regexm(postcode,"`p1'")
    replace valid = 1 if regexm(postcode,"`p2a1a'`p2a1d'`p2b'")
    replace valid = 1 if regexm(postcode,"`p2a1b'`p2a1d'`p2b'")
    replace valid = 1 if regexm(postcode,"`p2a1c'`p2a1d'`p2b'")
    replace valid = 1 if regexm(postcode,"(`p2a2'|`p2a3'|`p2a4')`p2b'")

-- Phil

*   For searches and help try:

MRC Centre of Epidemiology for Child Health
UCL Institute of Child Health
30 Guilford Street
London, WC1N 1EH

Tel - 0207 905 2320

*   For searches and help try:

© Copyright 1996–2015 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   Site index