From |
Nick Cox <n.j.cox@durham.ac.uk> |

To |
"'statalist@hsphsun2.harvard.edu'" <statalist@hsphsun2.harvard.edu> |

Subject |
st: RE: problem with regexm leading to "regexp: unterminated ()" error for all observations |

Date |
Fri, 3 Jun 2011 14:35:38 +0100 |

I guess there are small problems at least on various levels here. First, the regular expression may well be long for Stata; Mata doesn't seem to have the same limits. Second, I don't think the syntax {2} is supported by Stata. I'd see if you can make progress by breaking it down into steps. Declare postcodes invalid and then change your mind each time they satisfy one of the possible patterns. My own postcode is DH1 2NJ. Just a coincidence, but I like it. Nick n.j.cox@durham.ac.uk Jamie Fagg I've a problem with the function -regexm-. I get the following message: regexp: unterminated () Frederico Belotti raised this in 2009 (http://www.stata.com/statalist/archive/2009-04/msg00573.html) and Martin Weiss suggested contacting Tech support but as far as I can see there is no other comment referring to the error. (http://www.stata.com/statalist/archive/2009-04/msg00575.html). My aim: to find out which of a list of 22,907 postcodes conform to the UK standard syntax. I've never used regular expressions before, and I started trying to build the regular expression myself yesterday and ran a few options with some (limited) success before a colleague pointed me to a pre-written regular expression on Wikipedia (http://en.wikipedia.org/wiki/Postcodes_in_the_United_Kingdom). As this seems highly complex, has been done, and I really only want to do this once, it would be very helpful to be able to simply use it within Stata. I have run the regular expression through a javascript regular expression checker here (http://regexpal.com/) and it seemed to work correctly, picking out the valid (E1 4NS, SW8 2XR) versions of the postcodes in the example below. This is an example of the code I used plus sample data if users want to see if they can reproduce the error. I would very much appreciate any feedback about this, Best wishes, Jamie ******start of example********* input str15 postcode E1 4NS EI 4NS SW8 2XR SW8 ZXR end #delimit ; //regular expression to define whether postcode is syntactically correct ge postcodevalid = 1 if regexm(postcode,"(GIR 0AA)|(((A[BL]|B[ABDHLNRSTX] ?|C[ABFHMORTVW]|D[ADEGHLNTY]|E[HNX]?|F[KY]|G[LUY]?|H[ADGPRSUX] |I[GMPV]|JE|K[ATWY]|L[ADELNSU]?|M[EKL]?|N[EGNPRW]?|O[LX]|P[AEHLOR] |R[GHM]|S[AEGKLMNOPRSTY]?|T[ADFNQRSW]|UB|W[ADFNRSV]|YO|ZE)[1-9]?[0-9] |((E|N|NW|SE|SW|W)1|EC[1-4]|WC[12])[A-HJKMNPR-Y]|(SW|W)([2-9]|[1-9] [0-9])|EC[1-9][0-9]) [0-9][ABD-HJLNP-UW-Z]{2})")==1; *****end of example******* ******My Stata specs******** Stata/SE 11.1 for Windows (32-bit) Stata executable folder: C:\Program Files\Stata11\ name of file: StataSE.exe currently installed: 04 Nov 2010 Ado-file updates folder: C:\Program Files\Stata11\ado\updates\ names of files: (various) currently installed: 04 Jan 2011 Utilities updates folder: C:\Program Files\Stata11\utilities names of files: (various) currently installed: 01 Sep 2010 * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

Follow-Ups: Re: st: RE: problem with regexm leading to "regexp: unterminated ()" error for all observations From: Steven Samuels <sjsamuels@gmail.com>

References: st: problem with regexm leading to "regexp: unterminated ()" error for all observations From: Jamie Fagg <j.fagg@ich.ucl.ac.uk>

