Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

st: RE: regular expressions has too many literals


From   Kieran McCaul <[email protected]>
To   "[email protected]" <[email protected]>
Subject   st: RE: regular expressions has too many literals
Date   Tue, 26 Feb 2013 11:52:52 +0800

...


Put the team names in a new dataset with a variable name that is the same as the string variable in the existing dataset that you are searching.

Now merge the two datasets on that variable name and _merge==3 will indicate the matches.





-----Original Message-----
From: [email protected] [mailto:[email protected]] On Behalf Of Dimitriy V. Masterov
Sent: Tuesday, 26 February 2013 11:41 AM
To: Statalist
Subject: st: regular expressions has too many literals

I would like to do something like this:

keep if regexm(string,"Buffalo Bills") | regexm(string,"Dallas
Cowboys") | regexm(string,"Miami Dolphins") | regexm(string,"New York
Giants") | regexm(string,"New England Patriots") | regexm(string,"Philadelphia Eagles") | regexm(string,"New York Jets")
| regexm(string,"Washington Redskins") | regexm(string,"Baltimore
Ravens") | regexm(string,"Chicago Bears") | regexm (string,"Cincinnati Bengals") | regexm(string,"Detroit Lions") | regexm(string,"Cleveland Browns") | regexm(string,"Green Bay Packers")
| regexm(string,"Pittsburgh Steelers") | regexm(s    tring,"Minnesota
Vikings") | regexm(string,"Houston Texans") | regexm(string,"Atlanta
Falcons") | regexm(string,"Indianapolis Colts") | regexm(string,"Carolina Panthers") | regexm(string,"Jacksonville
Jaguars") | regexm(string,"New Orleans Saints") | regexm(string,"Tennessee Titans") | regexm(string,"Tampa Bay
Buccaneers") | regexm(string,"Denver Broncos") | regexm(string,"Arizona Cardinals") | regexm(string,"Kansas City
Chiefs") | regexm(string,"San Francisco 49ers") | regexm(string,"Oakland Raiders") | regexm(string,"Seattle Seahawks") | regexm(string,"San Diego Chargers") | regexm(string,"St. Louis Rams")

Just looking at this, you know the expression is too long for Stata to evaluate. Is the only way around this to loop over the 32 team names like this:

gen keepers = .
foreach team in "Buffalo Bills" "Dallas Cowboys" "Miami Dolphins" "New York Giants" "New England Patriots" "Philadelphia Eagles" "New York Jets" "Washington Redskins" "Baltimore Ravens" "Chicago Bears"
"Cincinnati Bengals" "Detroit Lions" "Cleveland Browns" "Green Bay Packers" "Pittsburgh Steelers" "Minnesota Vikings" "Houston Texans"
"Atlanta Falcons" "Indianapolis Colts" "Carolina Panthers"
"Jacksonville Jaguars" "New Orleans Saints" "Tennessee Titans" "Tampa Bay Buccaneers" "Denver Broncos" "Arizona Cardinals" "Kansas City Chiefs" "San Francisco 49ers" "Oakland Raiders" "Seattle Seahawks"
"San Diego Chargers" "St. Louis Rams" {
     replace keepers = 1 if regexm(string,"`team'") } keep if keepers ==1

Or is there a more clever way?
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index