Bookmark and Share

Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down at the end of May, and its replacement, statalist.org is already up and running.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

st: AW: Using regex to identify strings with capital letters


From   "Martin Weiss" <martin.weiss1@gmx.de>
To   <statalist@hsphsun2.harvard.edu>
Subject   st: AW: Using regex to identify strings with capital letters
Date   Thu, 27 May 2010 14:37:21 +0200

<> 

" These
records I don't want, but easily dropped by modifying Martin's
suggestion:
	drop if regexm(substr(var1,1,2) , "^([P][O])")"




You can probably safely omit the -substr()- part here as the "^" sign
indicates that the "P" character must be positioned at the beginning of the
string. See http://www.stata.com/support/faqs/data/regex.html



*************
clear*

input str5 myvar
 "POwll"
 "POert"
 "SEtrt"
 "WEret"
end

preserve
 drop if regexm(substr(myvar,1,2) , "^([P][O])")
 l, noo
restore

drop if regexm(myvar , "^([P][O])")
list, noo

*************



HTH
Martin


-----Ursprüngliche Nachricht-----
Von: owner-statalist@hsphsun2.harvard.edu
[mailto:owner-statalist@hsphsun2.harvard.edu] Im Auftrag von Beecroft, Erik
(VDSS)
Gesendet: Donnerstag, 27. Mai 2010 14:26
An: statalist@hsphsun2.harvard.edu
Betreff: st: Using regex to identify strings with capital letters

Thank you to Martin and Nick for solving my problem, Martin with the
regexm function and Nick with inrange.

Both of these work with my data:
	keep if regexm(substr(var1,1,2) , "^([A-Z][A-Z])")
	keep if inrange(substr(var1,1,1), "A", "Z") &
inrange(substr(var1,2,1), "A","Z")

Both also keep records that begin with "PO" for Post Office box.  These
records I don't want, but easily dropped by modifying Martin's
suggestion:
	drop if regexm(substr(var1,1,2) , "^([P][O])")

Thank you for your help.

Erik


Martin is correct that -inrange("Er", "AA", "ZZ")- is true. Possibly
this is Erik's specific problem, namely that having the first capital
letter in "A" ... "Z" is necessary but not sufficient. 

I offer as a stronger criterion 

inrange(substr(myvar,1,1), "A", "Z") & inrange(substr(myvar,2,1), "A",
"Z")

I continue to like regex solutions when they are the simplest available!

Nick 
n.j.cox@durham.ac.uk 


*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   Site index