Bookmark and Share

Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down on April 23, and its replacement, is already up and running.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

st: RE: regular expressions has too many literals

From   Kieran McCaul <>
To   "" <>
Subject   st: RE: regular expressions has too many literals
Date   Tue, 26 Feb 2013 11:52:52 +0800


Put the team names in a new dataset with a variable name that is the same as the string variable in the existing dataset that you are searching.

Now merge the two datasets on that variable name and _merge==3 will indicate the matches.

-----Original Message-----
From: [] On Behalf Of Dimitriy V. Masterov
Sent: Tuesday, 26 February 2013 11:41 AM
To: Statalist
Subject: st: regular expressions has too many literals

I would like to do something like this:

keep if regexm(string,"Buffalo Bills") | regexm(string,"Dallas
Cowboys") | regexm(string,"Miami Dolphins") | regexm(string,"New York
Giants") | regexm(string,"New England Patriots") | regexm(string,"Philadelphia Eagles") | regexm(string,"New York Jets")
| regexm(string,"Washington Redskins") | regexm(string,"Baltimore
Ravens") | regexm(string,"Chicago Bears") | regexm (string,"Cincinnati Bengals") | regexm(string,"Detroit Lions") | regexm(string,"Cleveland Browns") | regexm(string,"Green Bay Packers")
| regexm(string,"Pittsburgh Steelers") | regexm(s    tring,"Minnesota
Vikings") | regexm(string,"Houston Texans") | regexm(string,"Atlanta
Falcons") | regexm(string,"Indianapolis Colts") | regexm(string,"Carolina Panthers") | regexm(string,"Jacksonville
Jaguars") | regexm(string,"New Orleans Saints") | regexm(string,"Tennessee Titans") | regexm(string,"Tampa Bay
Buccaneers") | regexm(string,"Denver Broncos") | regexm(string,"Arizona Cardinals") | regexm(string,"Kansas City
Chiefs") | regexm(string,"San Francisco 49ers") | regexm(string,"Oakland Raiders") | regexm(string,"Seattle Seahawks") | regexm(string,"San Diego Chargers") | regexm(string,"St. Louis Rams")

Just looking at this, you know the expression is too long for Stata to evaluate. Is the only way around this to loop over the 32 team names like this:

gen keepers = .
foreach team in "Buffalo Bills" "Dallas Cowboys" "Miami Dolphins" "New York Giants" "New England Patriots" "Philadelphia Eagles" "New York Jets" "Washington Redskins" "Baltimore Ravens" "Chicago Bears"
"Cincinnati Bengals" "Detroit Lions" "Cleveland Browns" "Green Bay Packers" "Pittsburgh Steelers" "Minnesota Vikings" "Houston Texans"
"Atlanta Falcons" "Indianapolis Colts" "Carolina Panthers"
"Jacksonville Jaguars" "New Orleans Saints" "Tennessee Titans" "Tampa Bay Buccaneers" "Denver Broncos" "Arizona Cardinals" "Kansas City Chiefs" "San Francisco 49ers" "Oakland Raiders" "Seattle Seahawks"
"San Diego Chargers" "St. Louis Rams" {
     replace keepers = 1 if regexm(string,"`team'") } keep if keepers ==1

Or is there a more clever way?
*   For searches and help try:

*   For searches and help try:

© Copyright 1996–2016 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   Site index