Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
From | Steve Samuels <sjsamuels@gmail.com> |
To | statalist@hsphsun2.harvard.edu |
Subject | Re: st: AW: Using regex to identify strings with capital letters |
Date | Thu, 27 May 2010 09:16:05 -0400 |
Steve On Thu, May 27, 2010 at 8:37 AM, Martin Weiss <martin.weiss1@gmx.de> wrote: > > <> > > " These > records I don't want, but easily dropped by modifying Martin's > suggestion: > drop if regexm(substr(var1,1,2) , "^([P][O])")" > > > > > You can probably safely omit the -substr()- part here as the "^" sign > indicates that the "P" character must be positioned at the beginning of the > string. See http://www.stata.com/support/faqs/data/regex.html > > > > ************* > clear* > > input str5 myvar > "POwll" > "POert" > "SEtrt" > "WEret" > end > > preserve > drop if regexm(substr(myvar,1,2) , "^([P][O])") > l, noo > restore > > drop if regexm(myvar , "^([P][O])") > list, noo > > ************* > > > > HTH > Martin > > > -----Ursprüngliche Nachricht----- > Von: owner-statalist@hsphsun2.harvard.edu > [mailto:owner-statalist@hsphsun2.harvard.edu] Im Auftrag von Beecroft, Erik > (VDSS) > Gesendet: Donnerstag, 27. Mai 2010 14:26 > An: statalist@hsphsun2.harvard.edu > Betreff: st: Using regex to identify strings with capital letters > > Thank you to Martin and Nick for solving my problem, Martin with the > regexm function and Nick with inrange. > > Both of these work with my data: > keep if regexm(substr(var1,1,2) , "^([A-Z][A-Z])") > keep if inrange(substr(var1,1,1), "A", "Z") & > inrange(substr(var1,2,1), "A","Z") > > Both also keep records that begin with "PO" for Post Office box. These > records I don't want, but easily dropped by modifying Martin's > suggestion: > drop if regexm(substr(var1,1,2) , "^([P][O])") > > Thank you for your help. > > Erik > > > Martin is correct that -inrange("Er", "AA", "ZZ")- is true. Possibly > this is Erik's specific problem, namely that having the first capital > letter in "A" ... "Z" is necessary but not sufficient. > > I offer as a stronger criterion > > inrange(substr(myvar,1,1), "A", "Z") & inrange(substr(myvar,2,1), "A", > "Z") > > I continue to like regex solutions when they are the simplest available! > > Nick > n.j.cox@durham.ac.uk > > > * > * For searches and help try: > * http://www.stata.com/help.cgi?search > * http://www.stata.com/support/statalist/faq > * http://www.ats.ucla.edu/stat/stata/ > > > * > * For searches and help try: > * http://www.stata.com/help.cgi?search > * http://www.stata.com/support/statalist/faq > * http://www.ats.ucla.edu/stat/stata/ > -- Steven Samuels sjsamuels@gmail.com 18 Cantine's Island Saugerties NY 12477 USA Voice: 845-246-0774 Fax: 206-202-4783 * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/