As said, do look at -moss-. Nick On 27 Aug 2011, at 15:22, KOTa <kota.alba@gmail.com> wrote:

simplier in logistics way. i.e. i tried to do the whole thing withot creating additional variables (that split creates) in the middle. another question, if you know. also about strings. when i import file to stata (from excel, for example) i have some very long strings, that stata cuts to 244 chars. is there any trick to go around it? except making them shorter before importing :) thank you 2011/8/27 Nick Cox <njcoxstata@gmail.com>:Better in what sense? Quicker to get a solution? Simpler? Othercriteria?I don't know a way of counting more than 9 matches directly. I think you would need, if you continue to follow that path, to loop over a string repeatedly finding new instances and counting. See also -moss- from SSC. Nick On Sat, Aug 27, 2011 at 2:52 PM, KOTa <kota.alba@gmail.com> wrote:yes, i do work now with split, just thought with regex it will bebetter.anyway, is there a way to find out how many expressions regexmfinds?1. what i mean is i can access the 1st 2nd etc up to 9 with regexs, but if i dont know how many there are -> i dont know which one is last. 2. what if more the 9 expressions found? according to manual regexs only can have 0-9 parameters. thanks 2011/8/27 Nick Cox <njcoxstata@gmail.com>:Well, you did say "it always ends by "% th_aft". I will continue as I started. If you first blank out stuff you don't need then you can just use -split- to separate out elements. If you parse on spaces then it isimmaterial when you have 2 or 3 digits before, you retrieve thenumbereither way. No need for regex demonstrated. Nick On Sat, Aug 27, 2011 at 2:16 PM, KOTa <kota.alba@gmail.com> wrote:thanks Eric, Nick I used your advices and almost finished. but encountered one small problems on the way.i have the same type of string - "0.15%-$1(B) 0.14%-$2(B) 0.12%-$2(B)0.10% th_aft." - number of digits after the dot can be 2 or 3,it'snot constantand i am trying to extract the last % (i.e.0.10% in this case)using"$" like this:g example = regexs(0) if regexm( fee_str, "[0-9]+\.[0-9]*[%]$")or gexample = regexs(0) if regexm( fee_str, "[0-9]+\.[0-9]*[%]+$")and itfails in both cases. the result is empty it does extract the first one (0.15%) if i dont use "$" what is wrong? thanks p.s. Nick, th_aft is not a terminator, its not always there 2011/8/27 Nick Cox <njcoxstata@gmail.com>:It is not obvious to me that you need -regexm()- at all.The text " th_aft" appears to be just a terminator that youdon't careabout, so remove it. replace j = subinstr(j, " th_aft", "", .) The last element can be separated off and then removed. gen last = word(j, -1) replace j = reverse(j) replace j = subinstr(j, word(j,1) , "", 1) replace j = reverse(j) We reverse it in order to avoid removing any identical substring. Those three lines could be telescoped into one. Then it looks like an exercise in -subinstr()- and -split-. NickOn Sat, Aug 27, 2011 at 2:28 AM, Eric Booth<ebooth@ppri.tamu.edu> wrote:<>Here's an example...note that I messed with the formatting ofthe %'s and $'s in my example data a bit to show how flexiblethe -regex- is in the latter part of the code; however, you'llneed to check that there aren't other patterns/symbols in yourstring that could break my code.There are other ways to approach this, but I think the logichere is easy to follow:*************! watch for wrapping: **example data: clear inp str70(j) "A: 0.35%-$197(M) 0.30%-$397(M) 0.27% th_aft." "A: 0.25%-$198(M) 0.12%-$398(M) 0.99%-$300(M) 0.00% th_aft." "A: 1.0%-$109(M) 0.1% th_aft." "A: 0%-$199(M) 0.30%-$366(M) 1.99% th_aft." end **regexm example == easier to use -split- initially g example = regexs(0) /// if regexm(j, "(([0-9]+\.[0-9]*[%-]+)([\$][0-9]*))") l drop example **split: replace j = subinstr(j, "A: ", "", 1) split j, p("(M) ") **first, find x10 : g x10 = "" tempvar flag g `flag' = "" foreach var of varlist j? { replace `flag' = "`var'" if /// strpos(`var', "th_aft")>0 replace x10 = subinstr(`var', "th_aft.", "", .) /// if `flag' == "`var'" replace `var' = "" if strpos(`var', "th_aft")>0 } **now, create x1-x9 and y1-y9 forval num = 1/9 { g x`num' = "" g y`num' = "" cap replace x`num' = regexs(0) if /// regexm(j`num', "([0-9]+\.?[0-9]*[%]+)") /// & !mi(j`num') & mi(x`num') //probably overkill cap replace y`num' = regexs(0) if /// regexm(j`num', "([\$][0-9]*\.?[0-9]*)") /// & !mi(j`num') & mi(y`num') } **finally, create y10 == y2: g y10 = y2 ****list: l *1 l *2 l *3 *************! - Eric On Aug 26, 2011, at 6:59 PM, KOTa wrote:I am trying to extract some data from text variable and beingnew tostata programming struggling with finding right format. my problem is as following:for example i have string variable as following: "A: 0.35%-$100(M)0.30%-$300(M) 0.27% th_aft."number of pairs "% - (M)" can be from 1 to 9 and it alwaysends by "% th_aft"I have 10 pairs of variables X1 Y1 .... X10 = 0.27% Y10 = Y2 (i.e. last Y extracted from sting) I am trying to use regexm but unsuccessfully, Any suggestions?

