Bookmark and Share

Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down on April 23, and its replacement, statalist.org is already up and running.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

st: RE: FW: Using regex to identify strings with capital letters


From   "Martin Weiss" <martin.weiss1@gmx.de>
To   <statalist@hsphsun2.harvard.edu>
Subject   st: RE: FW: Using regex to identify strings with capital letters
Date   Wed, 26 May 2010 20:41:57 +0200

<>

My example suggests otherwise. -trim()-ing beforehand is a good idea,
though:

***********
clear*

input  str50  myvar
text1
	text2
	"VA department of Social Services"
	text4
	text5
	" DG"
	"va fer"
end

l if inrange(substr(myvar,1,2), "AA", "ZZ")
replace myvar=trim(myvar)
l if inrange(substr(myvar,1,2), "AA", "ZZ")
***********


HTH
Martin


-----Original Message-----
From: owner-statalist@hsphsun2.harvard.edu
[mailto:owner-statalist@hsphsun2.harvard.edu] On Behalf Of Beecroft, Erik
(VDSS)
Sent: Mittwoch, 26. Mai 2010 20:37
To: statalist@hsphsun2.harvard.edu
Subject: st: FW: Using regex to identify strings with capital letters

I tried Nick's suggestion, pasted below, but inrange does not seem to
distinguish between lower and upper case.  In other words, the statement
below keeps all observations that begin with two letters, whether
capital or lower case.

Erik




You don't need regex for this. 

... if inrange(substr(myvar,1,2), "AA", "ZZ") 

should be enough, or even "AK" to "WY" or whatever it is. (Remember this
is an international list!) 

Nick 
n.j.cox@durham.ac.uk 


-----Original Message-----
From: Beecroft, Erik (VDSS) 
Sent: Wednesday, May 26, 2010 2:03 PM
To: 'statalist@hsphsun2.harvard.edu'
Subject: Using regex to identify strings with capital letters

I need to extract certain observations from a series of text files.
Each file contains only one variable, which is string.  The
observations I want all begin with two capital letters. (They are state
abbreviations, such as VA or AK).  The other observations do not begin
with two capital letters.

Is there a way to tell Stata to keep only observations for which the
variable begins with two capital letters?

It seems like the regex function might work, but I have never worked
with regular expression syntax before.  

For example, a portion of a text file might look like:
	text1
	text2
	VA department of Social Services
	text4
	text5

I want to keep only the third observation above.

I am using Stata for Windows 10.1.

Erik


*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   Site index