Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
From | Steve Samuels <sjsamuels@gmail.com> |
To | statalist@hsphsun2.harvard.edu |
Subject | Re: st: Manipulation of string variable using -regexm- |
Date | Fri, 11 Oct 2013 18:54:17 -0400 |
Federico is apparently too modest to say that he is one of the co-authors of -screening-, so I will say it. Steve On Oct 11, 2013, at 6:25 PM, Nick Cox wrote: Note also other solutions such as local end substr(CurrRtg, 2, .) gen var_star = (`end' == "*") + 2 * (`end' == "*.") + 3 * (`end' ="*+") assert `end' == "" if var_star == 0 Nick njcoxstata@gmail.com On 11 October 2013 21:59, Federico Belotti <f.belotti@gmail.com> wrote: > Dear Herve > > my suggestion is to use the command -screening-, a Stata's user-written string variables exploring and recoding tool. > You need to search and install it using > > findit screening > > Once installed, the syntax you are looking for to obtain a new numeric variable equal to 0 if not star, 1 if only *, 2 if *- and 3 if *+ is the following > > screening, source(CurrRtg, upper) key(end "\*" end "\*-" end "\*\+" end "[A-Z]") new(mark, numeric) recode(1 "1" 2 "2" 3 "3" 4 "0") > > where > > 1) the option -source()- specifies the source variable that have to be recoded (note the suboption -upper- which allows to perform a case-insensitive match (uppercase)); > 2) the option -key()- specifies the keywords you are looking for (in this case represented by regular expressions); > 3) the option -new()- specifies the name of the new variable to be created (in this case, I called it "mark". Note the suboption -numeric- that allows to get the newly created variable as a numeric variable); > 4) the option -recode()- specifies the user-defined coding scheme following the keywords order. > > See -help screening- for more details. > > Hope this helps. > Federico > > > On Oct 11, 2013, at 6:40 PM, STOLOWY, Herve wrote: > >> Dear Statalisters: >> >> Using Stata 12.1, I want to extract a portion of a string variable using >> regular expressions, i.e. -regexs- and -regexm-. >> >> My string variable has different possible values. Example: >> >> A >> A * >> A *- >> A *+ >> B >> B * >> B *- >> B *+ >> etc. >> >> I would like to get a variable with the content filled with the * or *- or >> *+ or with this type of coding: >> >> 0 if not star >> 1 if only * >> 2 if *- >> 3 if *+ >> >> The * or *- or *+ always appear at the end on the value. >> >> I tried the following syntax: >> >> gen var_star =3D regexs(0) if(regexm(CurrRtg, "\*" "\*+" "\*-")) >> >> Unfortunately, I get a * in all cases there is a * included in the value, >> but I do not get the *- or *+. >> >> I have difficulties with the syntax of -regexm-. >> >> There is maybe another way to get the same result. >> >> Best regards >> >> Herve Stolowy >> * >> * For searches and help try: >> * http://www.stata.com/help.cgi?search >> * http://www.stata.com/support/faqs/resources/statalist-faq/ >> * http://www.ats.ucla.edu/stat/stata/ > > -- > Federico Belotti, PhD > Research Fellow > Centre for Economics and International Studies > University of Rome Tor Vergata > tel/fax: +39 06 7259 5627 > e-mail: federico.belotti@uniroma2.it > web: http://www.econometrics.it > > > * > * For searches and help try: > * http://www.stata.com/help.cgi?search > * http://www.stata.com/support/faqs/resources/statalist-faq/ > * http://www.ats.ucla.edu/stat/stata/ * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/faqs/resources/statalist-faq/ * http://www.ats.ucla.edu/stat/stata/ * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/faqs/resources/statalist-faq/ * http://www.ats.ucla.edu/stat/stata/