Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: st: regular expression or some simpler data extraction method

From   Nick Cox <>
To   "''" <>
Subject   RE: st: regular expression or some simpler data extraction method
Date   Thu, 17 Nov 2011 11:22:45 +0000

The regex solution is nice. I am always interested in alternatives. It helps to have different tools in the toolkit and some may be easier for at least some people to think about, and therefore to use, even if the solutions are more long-winded. 

For the examples given, repeated here, 

1 PV, 5 CC, 37 WT
101 WT
2 PV, 9 WT
1 WT
38 WT

this would work

gen foo = real(word(phase, -2))

and that could be made conditional on -word(phase, -1) == "WT"-. 

However, Ben said that "WT" is always the end of the string. 

(To make a point I've often made, if you know that some string really is numeric, and you want a single variable, just use -real()- directly, not -destring-. I say this as a fan of -destring-, indeed as its notional author.)

As a matter of technique, if it's a matter of finding the word before "WT", -word()- could be used like this

gen where = 0
forval j = 1/10 {
	replace where = `j' if word(phase, `j') == "WT"

gen foo2 = real(word(phase, where - 1)) if where 

for some appropriate value of 10. 


Ben Hoen [edited] 

Thanks again Matthew & Brendan.

I realized that I had changed the variable name in the meantime to
"phase_description", which was causing the type mismatch error.

This syntax worked great!

gen vi_tnum = regexs(1) if regexm(phase_description, "([0-9]+) WT$") 


I tried these because WT is always the end of the string, therefore any
comma would necessarily precede the digits and the WT.  Maybe that was not
clear originally.

*   For searches and help try:

© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index