Notice: On March 31, it was **announced** that Statalist is moving from an email list to a **forum**. The old list will shut down on April 23, and its replacement, **statalist.org** is already up and running.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

From |
Jamie Fagg <j.fagg@ich.ucl.ac.uk> |

To |
statalist@hsphsun2.harvard.edu |

Subject |
Re: st: problem with regexm leading to "regexp: unterminated ()" error for all observations |

Date |
Mon, 06 Jun 2011 15:11:55 +0100 |

Dear Phil Schumm, Nick Cox and Steve Samuels, Many thanks for all your help on this.

Best wishes, Jamie On 03/06/2011 18:10, Phil Schumm wrote:

On Jun 3, 2011, at 7:35 AM, Jamie Fagg wrote:I've a problem with the function -regexm-. I get the following message: regexp: unterminated ()<snip>#delimit ; //regular expression to define whether postcode is syntactically correctge postcodevalid = 1 if regexm(postcode,"(GIR0AA)|(((A[BL]|B[ABDHLNRSTX]?|C[ABFHMORTVW]|D[ADEGHLNTY]|E[HNX]?|F[KY]|G[LUY]?|H[ADGPRSUX] |I[GMPV]|JE|K[ATWY]|L[ADELNSU]?|M[EKL]?|N[EGNPRW]?|O[LX]|P[AEHLOR] |R[GHM]|S[AEGKLMNOPRSTY]?|T[ADFNQRSW]|UB|W[ADFNRSV]|YO|ZE)[1-9]?[0-9] |((E|N|NW|SE|SW|W)1|EC[1-4]|WC[12])[A-HJKMNPR-Y]|(SW|W)([2-9]|[1-9] [0-9])|EC[1-9][0-9]) [0-9][ABD-HJLNP-UW-Z]{2})")==1;I'm not sure why Stata chokes on this, though I would suspect it mighthave something to do with the length. As Nick and Steven have alreadynoted, the repeat qualifier {n} is not supported by Stata's regularexpression syntax, so you'll need to replace[ABD-HJLNP-UW-Z]{2} with the equivalent [ABD-HJLNP-UW-Z][ABD-HJLNP-UW-Z]Now, Nick suggested breaking the expression up, so let's do that.Your expression is equal to(p1)|(((p2a1a|p2a1b|p2a1c)p2a1d|p2a2|p2a3|p2a4)p2b) where the individual parts (as assigned to Stata macros) are loc p1 "GIR 0AA"loc p2a1a"A[BL]|B[ABDHLNRSTX]?|C[ABFHMORTVW]|D[ADEGHLNTY]|E[HNX]?|F[KY]|G[LUY]?"loc p2a1b"H[ADGPRSUX]|I[GMPV]|JE|K[ATWY]|L[ADELNSU]?|M[EKL]?|N[EGNPRW]?|O[LX]"loc p2a1c"P[AEHLOR]|R[GHM]|S[AEGKLMNOPRSTY]?|T[ADFNQRSW]|UB|W[ADFNRSV]|YO|ZE"loc p2a1d "[1-9]?[0-9]" loc p2a2 "((E|N|NW|SE|SW|W)1|EC[1-4]|WC[12])[A-HJKMNPR-Y]" loc p2a3 "(SW|W)([2-9]|[1-9][0-9])" loc p2a4 "EC[1-9][0-9]" loc p2b " [0-9][ABD-HJLNP-UW-Z][ABD-HJLNP-UW-Z]" This may then be easily broken up as follows: gen byte valid = regexm(postcode,"`p1'") replace valid = 1 if regexm(postcode,"`p2a1a'`p2a1d'`p2b'") replace valid = 1 if regexm(postcode,"`p2a1b'`p2a1d'`p2b'") replace valid = 1 if regexm(postcode,"`p2a1c'`p2a1d'`p2b'") replace valid = 1 if regexm(postcode,"(`p2a2'|`p2a3'|`p2a4')`p2b'") -- Phil * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

-- MRC Centre of Epidemiology for Child Health UCL Institute of Child Health 30 Guilford Street London, WC1N 1EH Tel - 0207 905 2320 * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

**References**:**st: problem with regexm leading to "regexp: unterminated ()" error for all observations***From:*Jamie Fagg <j.fagg@ich.ucl.ac.uk>

**Re: st: problem with regexm leading to "regexp: unterminated ()" error for all observations***From:*Phil Schumm <pschumm@uchicago.edu>

- Prev by Date:
**st: RE: Difficult wide file** - Next by Date:
**RE: st: Elimination of outliers** - Previous by thread:
**Re: st: problem with regexm leading to "regexp: unterminated ()" error for all observations** - Next by thread:
**st: Register Now for Introduction to Stata for Medical Statistics Course** - Index(es):