Notice: On March 31, it was **announced** that Statalist is moving from an email list to a **forum**. The old list will shut down on April 23, and its replacement, **statalist.org** is already up and running.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

From |
Eric Booth <ebooth@ppri.tamu.edu> |

To |
"<statalist@hsphsun2.harvard.edu>" <statalist@hsphsun2.harvard.edu> |

Subject |
Re: st: regexm |

Date |
Sat, 27 Aug 2011 01:28:42 +0000 |

<> Here's an example...note that I messed with the formatting of the %'s and $'s in my example data a bit to show how flexible the -regex- is in the latter part of the code; however, you'll need to check that there aren't other patterns/symbols in your string that could break my code. There are other ways to approach this, but I think the logic here is easy to follow: *************! watch for wrapping: **example data: clear inp str70(j) "A: 0.35%-$197(M) 0.30%-$397(M) 0.27% th_aft." "A: 0.25%-$198(M) 0.12%-$398(M) 0.99%-$300(M) 0.00% th_aft." "A: 1.0%-$109(M) 0.1% th_aft." "A: 0%-$199(M) 0.30%-$366(M) 1.99% th_aft." end **regexm example == easier to use -split- initially g example = regexs(0) /// if regexm(j, "(([0-9]+\.[0-9]*[%-]+)([\$][0-9]*))") l drop example **split: replace j = subinstr(j, "A: ", "", 1) split j, p("(M) ") **first, find x10 : g x10 = "" tempvar flag g `flag' = "" foreach var of varlist j? { replace `flag' = "`var'" if /// strpos(`var', "th_aft")>0 replace x10 = subinstr(`var', "th_aft.", "", .) /// if `flag' == "`var'" replace `var' = "" if strpos(`var', "th_aft")>0 } **now, create x1-x9 and y1-y9 forval num = 1/9 { g x`num' = "" g y`num' = "" cap replace x`num' = regexs(0) if /// regexm(j`num', "([0-9]+\.?[0-9]*[%]+)") /// & !mi(j`num') & mi(x`num') //probably overkill cap replace y`num' = regexs(0) if /// regexm(j`num', "([\$][0-9]*\.?[0-9]*)") /// & !mi(j`num') & mi(y`num') } **finally, create y10 == y2: g y10 = y2 ****list: l *1 l *2 l *3 *************! - Eric On Aug 26, 2011, at 6:59 PM, KOTa wrote: > Dear statalisters, > > I am trying to extract some data from text variable and being new to > stata programming struggling with finding right format. > > my problem is as following: > > for example i have string variable as following: "A: 0.35%-$100(M) > 0.30%-$300(M) 0.27% th_aft." > > number of pairs "% - (M)" can be from 1 to 9 and it always ends by "% th_aft" > > I have 10 pairs of variables X1 Y1 .... X10 Y10 > > my goal is to extract all pairs from the string variable and split > them into my separate variables. > > in this case the result should be: > > X1 = 0.35% > Y1 = $100 > > X2 = 0.30% > Y2 = $300 > > X3-X9 = y3-Y9 = 0 > > X10 = 0.27% > Y10 = Y2 (i.e. last Y extracted from sting) > > I am trying to use regexm but unsuccessfully, Any suggestions? > > > thank you in advance > > C. > * > * For searches and help try: > * http://www.stata.com/help.cgi?search > * http://www.stata.com/support/statalist/faq > * http://www.ats.ucla.edu/stat/stata/ * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

**Follow-Ups**:**Re: st: regexm***From:*Nick Cox <njcoxstata@gmail.com>

**References**:**st: regexm***From:*KOTa <kota.alba@gmail.com>

- Prev by Date:
**st: regexm** - Next by Date:
**st: e(wexp) versus e(wexp): different routines return different things** - Previous by thread:
**st: regexm** - Next by thread:
**Re: st: regexm** - Index(es):