Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: st: regular expression or some simpler data extraction method


From   "Ben Hoen" <[email protected]>
To   <[email protected]>
Subject   RE: st: regular expression or some simpler data extraction method
Date   Wed, 16 Nov 2011 15:27:25 -0500

Thanks again Mathew & Brendan.

I realized that I had changed the variable name in the meantime to
"phase_description", which was causing the type mismatch error.

This syntax worked great!

gen vi_tnum = regexs(1) if regexm(phase_description, "([0-9]+) WT$") 


Ben Hoen
LBNL
Office: 845-758-1896
Cell: 718-812-7589


-----Original Message-----
From: Ben Hoen [mailto:[email protected]] 
Sent: Wednesday, November 16, 2011 3:22 PM
To: [email protected]
Subject: Re: st: regular expression or some simpler data extraction method

Thanks Mathew.

That didn't seem to work.  I am getting a "type mismatch" error.

Based on your first response I also tried:

gen vi_tnum = regexs(1) if regexm(phase, "[\, ]?([0-9]+) WT$")
and
gen vi_tnum = regexs(1) if regexm(phase, "[\, ]?([0-9]+)[ WT]$")

and got the same "type mismatch" error, so maybe they are related.

I tried these because WT is always the end of the string, therefore any
comma would necessarily precede the digits and the WT.  Maybe that was not
clear originally.

Ben




Ben Hoen
Principal Research Associate
Lawrence Berkeley National Laboratory
Office: 845-758-1896
Cell: 718-812-7589
[email protected]
http://eetd.lbl.gov/ea/emp/staff/hoen.html


Re: st: regular expression or some simpler data extraction method
________________________________________
From
  Matthew White <[email protected]>
To
  [email protected]
Subject
  Re: st: regular expression or some simpler data extraction method
Date
  Wed, 16 Nov 2011 15:01:27 -0500
________________________________________
Hi Ben,

Scratch that; the "[ ,]?" isn't a good idea. The following should work
as long as there aren't codes other than "WT" that start with "WT":

gen vi_tnum = substr(regexs(0), 1, strpos(regexs(0), " ") - 1) if
regexm(phase, "[0-9]+ WT[ ,]?")
destring vi_tnum, replace



*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index