Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
From | Nick Cox <njcoxstata@gmail.com> |
To | "statalist@hsphsun2.harvard.edu" <statalist@hsphsun2.harvard.edu> |
Subject | Re: st: Manipulation of string variable using -regexm- |
Date | Sun, 13 Oct 2013 11:41:09 +0100 |
You have the data and I don't, so all I can do is suggest ideas that you may need to modify. Do you have often (always) have a space as second character? If so, you may need to modify the code to select from position 3 onwards, or to select using negative positions. Nick njcoxstata@gmail.com On 12 October 2013 23:18, STOLOWY, Herve <stolowy@hec.fr> wrote: > Dear Nick: > > I am really sorry but I get an error message after the last line: > > assert `end' == "" if var_star == 0 > 50880 contradictions in 50991 observations > assertion is false > r(9); > > Best regards > > Hervé > > On Sat, Oct 12, 2013 at 6:43 PM, Nick Cox <njcoxstata@gmail.com> wrote: >> This corrects a typo (sorry). Note that the definition of the local >> macro is essential for this to work, although it could be rewritten to >> avoid that. >> >> local end substr(CurrRtg, 2, .) >> gen var_star = (`end' == "*") + 2 * (`end' == "*.") + 3 * (`end' =="*+") >> assert `end' == "" if var_star == 0 >> >> Nick >> njcoxstata@gmail.com >> >> >> On 12 October 2013 14:46, STOLOWY, Herve <stolowy@hec.fr> wrote: >>> Dear Nick: >>> >>> After >>> >>> gen var_star = (`end' == "*") + 2 * (`end' == "*.") + 3 * (`end' ="*+") >>> >>> I get an error message: >>> >>> unknown function () >>> >>> Best regards >>> >>> Hervé >>> >>> >>> On Sat, Oct 12, 2013 at 12:25 AM, Nick Cox <njcoxstata@gmail.com> wrote: >>>> Note also other solutions such as >>>> >>>> local end substr(CurrRtg, 2, .) >>>> gen var_star = (`end' == "*") + 2 * (`end' == "*.") + 3 * (`end' ="*+") >>>> assert `end' == "" if var_star == 0 >>>> >>>> Nick >>>> njcoxstata@gmail.com >>>> >>>> >>>> On 11 October 2013 21:59, Federico Belotti <f.belotti@gmail.com> wrote: >>>>> Dear Herve >>>>> >>>>> my suggestion is to use the command -screening-, a Stata's user-written string variables exploring and recoding tool. >>>>> You need to search and install it using >>>>> >>>>> findit screening >>>>> >>>>> Once installed, the syntax you are looking for to obtain a new numeric variable equal to 0 if not star, 1 if only *, 2 if *- and 3 if *+ is the following >>>>> >>>>> screening, source(CurrRtg, upper) key(end "\*" end "\*-" end "\*\+" end "[A-Z]") new(mark, numeric) recode(1 "1" 2 "2" 3 "3" 4 "0") >>>>> >>>>> where >>>>> >>>>> 1) the option -source()- specifies the source variable that have to be recoded (note the suboption -upper- which allows to perform a case-insensitive match (uppercase)); >>>>> 2) the option -key()- specifies the keywords you are looking for (in this case represented by regular expressions); >>>>> 3) the option -new()- specifies the name of the new variable to be created (in this case, I called it "mark". Note the suboption -numeric- that allows to get the newly created variable as a numeric variable); >>>>> 4) the option -recode()- specifies the user-defined coding scheme following the keywords order. >>>>> >>>>> See -help screening- for more details. >>>>> >>>>> Hope this helps. >>>>> Federico >>>>> >>>>> >>>>> On Oct 11, 2013, at 6:40 PM, STOLOWY, Herve wrote: >>>>> >>>>>> Dear Statalisters: >>>>>> >>>>>> Using Stata 12.1, I want to extract a portion of a string variable using >>>>>> regular expressions, i.e. -regexs- and -regexm-. >>>>>> >>>>>> My string variable has different possible values. Example: >>>>>> >>>>>> A >>>>>> A * >>>>>> A *- >>>>>> A *+ >>>>>> B >>>>>> B * >>>>>> B *- >>>>>> B *+ >>>>>> etc. >>>>>> >>>>>> I would like to get a variable with the content filled with the * or *- or >>>>>> *+ or with this type of coding: >>>>>> >>>>>> 0 if not star >>>>>> 1 if only * >>>>>> 2 if *- >>>>>> 3 if *+ >>>>>> >>>>>> The * or *- or *+ always appear at the end on the value. >>>>>> >>>>>> I tried the following syntax: >>>>>> >>>>>> gen var_star =3D regexs(0) if(regexm(CurrRtg, "\*" "\*+" "\*-")) >>>>>> >>>>>> Unfortunately, I get a * in all cases there is a * included in the value, >>>>>> but I do not get the *- or *+. >>>>>> >>>>>> I have difficulties with the syntax of -regexm-. >>>>>> >>>>>> There is maybe another way to get the same result. >>>>>> >> >> * >> * For searches and help try: >> * http://www.stata.com/help.cgi?search >> * http://www.stata.com/support/faqs/resources/statalist-faq/ >> * http://www.ats.ucla.edu/stat/stata/ > > * > * For searches and help try: > * http://www.stata.com/help.cgi?search > * http://www.stata.com/support/faqs/resources/statalist-faq/ > * http://www.ats.ucla.edu/stat/stata/ * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/faqs/resources/statalist-faq/ * http://www.ats.ucla.edu/stat/stata/