Bookmark and Share

Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down on April 23, and its replacement, statalist.org is already up and running.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: st: RE: regular expressions has too many literals


From   Kieran McCaul <kieran.mccaul@uwa.edu.au>
To   "statalist@hsphsun2.harvard.edu" <statalist@hsphsun2.harvard.edu>
Subject   RE: st: RE: regular expressions has too many literals
Date   Tue, 26 Feb 2013 12:41:12 +0800

...


OK, how about this:

In your existing dataset:

use main, clear
gen byte flag=0
save main, replace

use the names dataset with one variable -team- that contains the team names.

use names, clear
forvalues i = 1/`=_N'  {
	local name = team[i]
	preserve 
	use main, clear
	replace flag = 1 if regexm(string,"`name'")
	save main, replace
	restore
}



-----Original Message-----
From: owner-statalist@hsphsun2.harvard.edu [mailto:owner-statalist@hsphsun2.harvard.edu] On Behalf Of Dimitriy V. Masterov
Sent: Tuesday, 26 February 2013 12:20 PM
To: Statalist
Subject: Re: st: RE: regular expressions has too many literals

Unfortunately regular expressions are required here since the string contains additional idiosyncratic text. I should have made that explicit.

DVM

On Mon, Feb 25, 2013 at 7:52 PM, Kieran McCaul <kieran.mccaul@uwa.edu.au> wrote:
> ...
>
>
> Put the team names in a new dataset with a variable name that is the same as the string variable in the existing dataset that you are searching.
>
> Now merge the two datasets on that variable name and _merge==3 will indicate the matches.
>
>
>
>
>
> -----Original Message-----
> From: owner-statalist@hsphsun2.harvard.edu 
> [mailto:owner-statalist@hsphsun2.harvard.edu] On Behalf Of Dimitriy V. 
> Masterov
> Sent: Tuesday, 26 February 2013 11:41 AM
> To: Statalist
> Subject: st: regular expressions has too many literals
>
> I would like to do something like this:
>
> keep if regexm(string,"Buffalo Bills") | regexm(string,"Dallas
> Cowboys") | regexm(string,"Miami Dolphins") | regexm(string,"New York
> Giants") | regexm(string,"New England Patriots") | 
> regexm(string,"Philadelphia Eagles") | regexm(string,"New York Jets")
> | regexm(string,"Washington Redskins") | regexm(string,"Baltimore
> Ravens") | regexm(string,"Chicago Bears") | regexm (string,"Cincinnati 
> Bengals") | regexm(string,"Detroit Lions") | regexm(string,"Cleveland 
> Browns") | regexm(string,"Green Bay Packers")
> | regexm(string,"Pittsburgh Steelers") | regexm(s    tring,"Minnesota
> Vikings") | regexm(string,"Houston Texans") | regexm(string,"Atlanta
> Falcons") | regexm(string,"Indianapolis Colts") | 
> regexm(string,"Carolina Panthers") | regexm(string,"Jacksonville
> Jaguars") | regexm(string,"New Orleans Saints") | 
> regexm(string,"Tennessee Titans") | regexm(string,"Tampa Bay
> Buccaneers") | regexm(string,"Denver Broncos") | 
> regexm(string,"Arizona Cardinals") | regexm(string,"Kansas City
> Chiefs") | regexm(string,"San Francisco 49ers") | 
> regexm(string,"Oakland Raiders") | regexm(string,"Seattle Seahawks") | 
> regexm(string,"San Diego Chargers") | regexm(string,"St. Louis Rams")
>
> Just looking at this, you know the expression is too long for Stata to evaluate. Is the only way around this to loop over the 32 team names like this:
>
> gen keepers = .
> foreach team in "Buffalo Bills" "Dallas Cowboys" "Miami Dolphins" "New York Giants" "New England Patriots" "Philadelphia Eagles" "New York Jets" "Washington Redskins" "Baltimore Ravens" "Chicago Bears"
> "Cincinnati Bengals" "Detroit Lions" "Cleveland Browns" "Green Bay Packers" "Pittsburgh Steelers" "Minnesota Vikings" "Houston Texans"
> "Atlanta Falcons" "Indianapolis Colts" "Carolina Panthers"
> "Jacksonville Jaguars" "New Orleans Saints" "Tennessee Titans" "Tampa Bay Buccaneers" "Denver Broncos" "Arizona Cardinals" "Kansas City Chiefs" "San Francisco 49ers" "Oakland Raiders" "Seattle Seahawks"
> "San Diego Chargers" "St. Louis Rams" {
>      replace keepers = 1 if regexm(string,"`team'") } keep if keepers 
> ==1
>
> Or is there a more clever way?
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/faqs/resources/statalist-faq/
> *   http://www.ats.ucla.edu/stat/stata/
>
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/faqs/resources/statalist-faq/
> *   http://www.ats.ucla.edu/stat/stata/
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   Site index