Stata: Data Analysis and Statistical Software

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: RE: problem with regexm leading to "regexp: unterminated ()" error for all observations

From	Steven Samuels <[email protected]>
To	[email protected]
Subject	Re: st: RE: problem with regexm leading to "regexp: unterminated ()" error for all observations
Date	Fri, 3 Jun 2011 10:06:35 -0400

Jamie

I get no error when I cut and paste from the Wikipedia  page, but I get no matches either.  I wouldn't expect matches, because Stata's regular expression parser doesn't recognize the {} repeat function.  (I do get matches with BBEdit's regular expression parser.)  So you'll have to implement this outside of Stata.   

Steve
[email protected]


On Jun 3, 2011, at 9:35 AM, Nick Cox wrote:

I guess there are small problems at least on various levels here. 

First, the regular expression may well be long for Stata; Mata doesn't seem to have the same limits. 

Second, I don't think the syntax {2} is supported by Stata. 

I'd see if you can make progress by breaking it down into steps. Declare postcodes invalid and then change your mind each time they satisfy one of the possible patterns. 

My own postcode is DH1 2NJ. Just a coincidence, but I like it. 

Nick 
[email protected] 

Jamie Fagg

I've a problem with the function -regexm-. I get the following message:

regexp: unterminated ()

Frederico Belotti raised this in 2009 
(http://www.stata.com/statalist/archive/2009-04/msg00573.html) and 
Martin Weiss suggested contacting
Tech support but as far as I can see there is no other comment referring 
to the error.
(http://www.stata.com/statalist/archive/2009-04/msg00575.html).

My aim: to find out which of a list of 22,907 postcodes conform to the 
UK standard syntax.

I've never used regular expressions before, and I started trying to 
build the regular expression myself yesterday and ran a few options
with some (limited) success before a colleague pointed me to a 
pre-written regular expression on Wikipedia 
(http://en.wikipedia.org/wiki/Postcodes_in_the_United_Kingdom).
As this seems highly complex, has been done, and I really only want to 
do this once, it would be very helpful to be able to simply use it 
within Stata.

I have run the regular expression through a javascript regular 
expression checker here  (http://regexpal.com/) and it seemed to work 
correctly, picking out the valid (E1 4NS, SW8 2XR)
versions of the postcodes in the example below.

This is an example of the code I used plus sample data if users want to 
see if they can reproduce the error.

I would very much appreciate any feedback about this,

Best wishes,

Jamie

******start of example*********

input str15 postcode
E1 4NS
EI 4NS
SW8 2XR
SW8 ZXR
end

#delimit ;

//regular expression to define whether postcode is syntactically correct

ge postcodevalid = 1 if regexm(postcode,"(GIR 0AA)|(((A[BL]|B[ABDHLNRSTX]
?|C[ABFHMORTVW]|D[ADEGHLNTY]|E[HNX]?|F[KY]|G[LUY]?|H[ADGPRSUX]
|I[GMPV]|JE|K[ATWY]|L[ADELNSU]?|M[EKL]?|N[EGNPRW]?|O[LX]|P[AEHLOR]
|R[GHM]|S[AEGKLMNOPRSTY]?|T[ADFNQRSW]|UB|W[ADFNRSV]|YO|ZE)[1-9]?[0-9]
|((E|N|NW|SE|SW|W)1|EC[1-4]|WC[12])[A-HJKMNPR-Y]|(SW|W)([2-9]|[1-9]
[0-9])|EC[1-9][0-9]) [0-9][ABD-HJLNP-UW-Z]{2})")==1;

*****end of example*******

******My Stata specs********

Stata/SE 11.1 for Windows (32-bit)

Stata executable
folder: C:\Program Files\Stata11\
name of file: StataSE.exe
currently installed: 04 Nov 2010

Ado-file updates
folder: C:\Program Files\Stata11\ado\updates\
names of files: (various)
currently installed: 04 Jan 2011

Utilities updates
folder: C:\Program Files\Stata11\utilities
names of files: (various)
currently installed: 01 Sep 2010


*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

References:
- st: problem with regexm leading to "regexp: unterminated ()" error for all observations
  - From: Jamie Fagg <[email protected]>
- st: RE: problem with regexm leading to "regexp: unterminated ()" error for all observations
  - From: Nick Cox <[email protected]>

Prev by Date: Re: st: panel data xtmixed vs xtreg
Next by Date: Re: st: Constrained Regression in Stata
Previous by thread: st: RE: problem with regexm leading to "regexp: unterminated ()" error for all observations
Next by thread: Re: st: problem with regexm leading to "regexp: unterminated ()" error for all observations
Index(es):
- Date
- Thread