Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

st: RE: problem with regexm leading to "regexp: unterminated ()" error for all observations


From   Nick Cox <[email protected]>
To   "'[email protected]'" <[email protected]>
Subject   st: RE: problem with regexm leading to "regexp: unterminated ()" error for all observations
Date   Fri, 3 Jun 2011 14:35:38 +0100

I guess there are small problems at least on various levels here. 

First, the regular expression may well be long for Stata; Mata doesn't seem to have the same limits. 

Second, I don't think the syntax {2} is supported by Stata. 

I'd see if you can make progress by breaking it down into steps. Declare postcodes invalid and then change your mind each time they satisfy one of the possible patterns. 

My own postcode is DH1 2NJ. Just a coincidence, but I like it. 

Nick 
[email protected] 

Jamie Fagg

I've a problem with the function -regexm-. I get the following message:

regexp: unterminated ()

Frederico Belotti raised this in 2009 
(http://www.stata.com/statalist/archive/2009-04/msg00573.html) and 
Martin Weiss suggested contacting
Tech support but as far as I can see there is no other comment referring 
to the error.
(http://www.stata.com/statalist/archive/2009-04/msg00575.html).

My aim: to find out which of a list of 22,907 postcodes conform to the 
UK standard syntax.

I've never used regular expressions before, and I started trying to 
build the regular expression myself yesterday and ran a few options
with some (limited) success before a colleague pointed me to a 
pre-written regular expression on Wikipedia 
(http://en.wikipedia.org/wiki/Postcodes_in_the_United_Kingdom).
As this seems highly complex, has been done, and I really only want to 
do this once, it would be very helpful to be able to simply use it 
within Stata.

I have run the regular expression through a javascript regular 
expression checker here  (http://regexpal.com/) and it seemed to work 
correctly, picking out the valid (E1 4NS, SW8 2XR)
versions of the postcodes in the example below.

This is an example of the code I used plus sample data if users want to 
see if they can reproduce the error.

I would very much appreciate any feedback about this,

Best wishes,

Jamie

******start of example*********

input str15 postcode
E1 4NS
EI 4NS
SW8 2XR
SW8 ZXR
end

#delimit ;

//regular expression to define whether postcode is syntactically correct

ge postcodevalid = 1 if regexm(postcode,"(GIR 0AA)|(((A[BL]|B[ABDHLNRSTX]
?|C[ABFHMORTVW]|D[ADEGHLNTY]|E[HNX]?|F[KY]|G[LUY]?|H[ADGPRSUX]
|I[GMPV]|JE|K[ATWY]|L[ADELNSU]?|M[EKL]?|N[EGNPRW]?|O[LX]|P[AEHLOR]
|R[GHM]|S[AEGKLMNOPRSTY]?|T[ADFNQRSW]|UB|W[ADFNRSV]|YO|ZE)[1-9]?[0-9]
|((E|N|NW|SE|SW|W)1|EC[1-4]|WC[12])[A-HJKMNPR-Y]|(SW|W)([2-9]|[1-9]
[0-9])|EC[1-9][0-9]) [0-9][ABD-HJLNP-UW-Z]{2})")==1;

*****end of example*******

******My Stata specs********

Stata/SE 11.1 for Windows (32-bit)

Stata executable
folder: C:\Program Files\Stata11\
name of file: StataSE.exe
currently installed: 04 Nov 2010

Ado-file updates
folder: C:\Program Files\Stata11\ado\updates\
names of files: (various)
currently installed: 04 Jan 2011

Utilities updates
folder: C:\Program Files\Stata11\utilities
names of files: (various)
currently installed: 01 Sep 2010


*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index