Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
From | Federico Belotti <f.belotti@gmail.com> |
To | statalist@hsphsun2.harvard.edu |
Subject | Re: st: Manipulation of string variable using -regexm- |
Date | Fri, 11 Oct 2013 22:59:12 +0200 |
Dear Herve my suggestion is to use the command -screening-, a Stata's user-written string variables exploring and recoding tool. You need to search and install it using findit screening Once installed, the syntax you are looking for to obtain a new numeric variable equal to 0 if not star, 1 if only *, 2 if *- and 3 if *+ is the following screening, source(CurrRtg, upper) key(end "\*" end "\*-" end "\*\+" end "[A-Z]") new(mark, numeric) recode(1 "1" 2 "2" 3 "3" 4 "0") where 1) the option -source()- specifies the source variable that have to be recoded (note the suboption -upper- which allows to perform a case-insensitive match (uppercase)); 2) the option -key()- specifies the keywords you are looking for (in this case represented by regular expressions); 3) the option -new()- specifies the name of the new variable to be created (in this case, I called it "mark". Note the suboption -numeric- that allows to get the newly created variable as a numeric variable); 4) the option -recode()- specifies the user-defined coding scheme following the keywords order. See -help screening- for more details. Hope this helps. Federico On Oct 11, 2013, at 6:40 PM, STOLOWY, Herve wrote: > Dear Statalisters: > > Using Stata 12.1, I want to extract a portion of a string variable using > regular expressions, i.e. -regexs- and -regexm-. > > My string variable has different possible values. Example: > > A > A * > A *- > A *+ > B > B * > B *- > B *+ > etc. > > I would like to get a variable with the content filled with the * or *- or > *+ or with this type of coding: > > 0 if not star > 1 if only * > 2 if *- > 3 if *+ > > The * or *- or *+ always appear at the end on the value. > > I tried the following syntax: > > gen var_star =3D regexs(0) if(regexm(CurrRtg, "\*" "\*+" "\*-")) > > Unfortunately, I get a * in all cases there is a * included in the value, > but I do not get the *- or *+. > > I have difficulties with the syntax of -regexm-. > > There is maybe another way to get the same result. > > Best regards > > Herve Stolowy > * > * For searches and help try: > * http://www.stata.com/help.cgi?search > * http://www.stata.com/support/faqs/resources/statalist-faq/ > * http://www.ats.ucla.edu/stat/stata/ -- Federico Belotti, PhD Research Fellow Centre for Economics and International Studies University of Rome Tor Vergata tel/fax: +39 06 7259 5627 e-mail: federico.belotti@uniroma2.it web: http://www.econometrics.it * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/faqs/resources/statalist-faq/ * http://www.ats.ucla.edu/stat/stata/