Stata: Data Analysis and Statistical Software

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: st: regular expression or some simpler data extraction method

From	Nick Cox <[email protected]>
To	"'[email protected]'" <[email protected]>
Subject	RE: st: regular expression or some simpler data extraction method
Date	Thu, 17 Nov 2011 11:22:45 +0000

The regex solution is nice. I am always interested in alternatives. It helps to have different tools in the toolkit and some may be easier for at least some people to think about, and therefore to use, even if the solutions are more long-winded. 

For the examples given, repeated here, 

1 PV, 5 CC, 37 WT
101 WT
2 PV, 9 WT
1 WT
38 WT

this would work

gen foo = real(word(phase, -2))

and that could be made conditional on -word(phase, -1) == "WT"-. 

However, Ben said that "WT" is always the end of the string. 

(To make a point I've often made, if you know that some string really is numeric, and you want a single variable, just use -real()- directly, not -destring-. I say this as a fan of -destring-, indeed as its notional author.)

As a matter of technique, if it's a matter of finding the word before "WT", -word()- could be used like this

gen where = 0
forval j = 1/10 {
	replace where = `j' if word(phase, `j') == "WT"
}

gen foo2 = real(word(phase, where - 1)) if where 

for some appropriate value of 10. 

Nick 
[email protected] 

Ben Hoen [edited] 

Thanks again Matthew & Brendan.

I realized that I had changed the variable name in the meantime to
"phase_description", which was causing the type mismatch error.

This syntax worked great!

gen vi_tnum = regexs(1) if regexm(phase_description, "([0-9]+) WT$") 

[...] 

I tried these because WT is always the end of the string, therefore any
comma would necessarily precede the digits and the WT.  Maybe that was not
clear originally.


*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

References:
- RE: st: regular expression or some simpler data extraction method
  - From: "Ben Hoen" <[email protected]>

Prev by Date: RE: st: Identifying the best scale without a "gold standard"
Next by Date: st: Looping over observations
Previous by thread: RE: st: regular expression or some simpler data extraction method
Next by thread: st: Reading multiple .csvs from a folder
Index(es):
- Date
- Thread