Statalist


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: st: regex-syntax error: "regexp: unmatched []"; no possibility to stop the do-file


From   "Nick Cox" <[email protected]>
To   <[email protected]>
Subject   RE: st: regex-syntax error: "regexp: unmatched []"; no possibility to stop the do-file
Date   Mon, 26 Jan 2009 21:04:56 -0000

This can't be right, precisely. -v1- is either numeric or string. 

replace v1 = 1      will work one way 
if strops(v1, ...   will work the other way 

-- but not both at once. 

But the spirit of David's comment is naturally correct. People sometimes
reach for regex when simpler stuff would suffice. 

Nick 
[email protected] 

David Elliott

The other issue is: do you need a regex? or would a normal string
matching expression expression do the trick:
replace v1=1 if strpos(v1,"[]")>0
A regex expression actually invokes a regex program each time it is
used which is far more computationally expensive than a simple match
expression.  With a large dataset, this can be a non-trivial
difference.

By way of example:
x---------begin code--------x
// you must -clear- before running
set memory 500m
sysuse auto
keep make
expand 250000
des
set rmsg on
gen byte test1 = (strpos(make,"AMC"))
gen byte test2 = (strmatch(make,"AMC*"))
gen byte test3 = (regexm(make,"AMC"))
// and just to show we get the same result
tab2 test*
set rmsg off
x---------end code--------x

On my system I get the following timings:

. gen byte test1 = (strpos(make,"AMC"))
r; t=6.24 16:12:39

. gen byte test2 = (strmatch(make,"AMC*"))
r; t=5.69 16:12:45

. gen byte test3 = (regexm(make,"AMC.*"))
r; t=22.89 16:13:07

giving a fourfold difference between strmatch and regexm in terms of
processing time.

In the example you gave there appears to be no need for the
sophistication (and debugging problems oft associated with) of a
regex.

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2024 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index