Bookmark and Share

Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down at the end of May, and its replacement, statalist.org is already up and running.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

st: problem with regexm leading to "regexp: unterminated ()" error for all observations


From   Jamie Fagg <j.fagg@ich.ucl.ac.uk>
To   Statalist <statalist@hsphsun2.harvard.edu>
Subject   st: problem with regexm leading to "regexp: unterminated ()" error for all observations
Date   Fri, 03 Jun 2011 13:35:19 +0100

Dear Stata users,

I've a problem with the function -regexm-. I get the following message:

regexp: unterminated ()

Frederico Belotti raised this in 2009 (http://www.stata.com/statalist/archive/2009-04/msg00573.html) and Martin Weiss suggested contacting Tech support but as far as I can see there is no other comment referring to the error.
(http://www.stata.com/statalist/archive/2009-04/msg00575.html).

My aim: to find out which of a list of 22,907 postcodes conform to the UK standard syntax.

I've never used regular expressions before, and I started trying to build the regular expression myself yesterday and ran a few options with some (limited) success before a colleague pointed me to a pre-written regular expression on Wikipedia (http://en.wikipedia.org/wiki/Postcodes_in_the_United_Kingdom). As this seems highly complex, has been done, and I really only want to do this once, it would be very helpful to be able to simply use it within Stata.

I have run the regular expression through a javascript regular expression checker here (http://regexpal.com/) and it seemed to work correctly, picking out the valid (E1 4NS, SW8 2XR)
versions of the postcodes in the example below.

This is an example of the code I used plus sample data if users want to see if they can reproduce the error.

I would very much appreciate any feedback about this,

Best wishes,

Jamie

******start of example*********

input str15 postcode
E1 4NS
EI 4NS
SW8 2XR
SW8 ZXR
end

#delimit ;

//regular expression to define whether postcode is syntactically correct

ge postcodevalid = 1 if regexm(postcode,"(GIR 0AA)|(((A[BL]|B[ABDHLNRSTX]
?|C[ABFHMORTVW]|D[ADEGHLNTY]|E[HNX]?|F[KY]|G[LUY]?|H[ADGPRSUX]
|I[GMPV]|JE|K[ATWY]|L[ADELNSU]?|M[EKL]?|N[EGNPRW]?|O[LX]|P[AEHLOR]
|R[GHM]|S[AEGKLMNOPRSTY]?|T[ADFNQRSW]|UB|W[ADFNRSV]|YO|ZE)[1-9]?[0-9]
|((E|N|NW|SE|SW|W)1|EC[1-4]|WC[12])[A-HJKMNPR-Y]|(SW|W)([2-9]|[1-9]
[0-9])|EC[1-9][0-9]) [0-9][ABD-HJLNP-UW-Z]{2})")==1;

*****end of example*******

******My Stata specs********

Stata/SE 11.1 for Windows (32-bit)

Stata executable
folder: C:\Program Files\Stata11\
name of file: StataSE.exe
currently installed: 04 Nov 2010

Ado-file updates
folder: C:\Program Files\Stata11\ado\updates\
names of files: (various)
currently installed: 04 Jan 2011

Utilities updates
folder: C:\Program Files\Stata11\utilities
names of files: (various)
currently installed: 01 Sep 2010


--
MRC Centre of Epidemiology for Child Health
UCL Institute of Child Health
30 Guilford Street
London, WC1N 1EH

Tel - 0207 905 2320
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   Site index