Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: Manipulation of string variable using -regexm-


From   Steve Samuels <[email protected]>
To   [email protected]
Subject   Re: st: Manipulation of string variable using -regexm-
Date   Fri, 11 Oct 2013 18:54:17 -0400

Federico is apparently too modest to say that he is one of the
co-authors of -screening-, so I will say it.

Steve

On Oct 11, 2013, at 6:25 PM, Nick Cox wrote:

Note also other solutions such as

local end substr(CurrRtg, 2, .)
gen var_star = (`end' == "*") + 2 * (`end' == "*.") + 3 * (`end' ="*+")
assert `end' == "" if var_star == 0

Nick
[email protected]


On 11 October 2013 21:59, Federico Belotti <[email protected]> wrote:
> Dear Herve
> 
> my suggestion is to use the command -screening-, a Stata's user-written string variables exploring and recoding tool.
> You need to search and install it using
> 
> findit screening
> 
> Once installed, the syntax you are looking for to obtain a new numeric variable equal to 0 if not star, 1 if only *, 2 if *- and 3 if *+ is the following
> 
> screening, source(CurrRtg, upper) key(end "\*" end "\*-" end "\*\+" end "[A-Z]") new(mark, numeric) recode(1 "1" 2 "2" 3 "3" 4 "0")
> 
> where
> 
>        1) the option -source()- specifies the source variable that have to be recoded (note the suboption -upper- which allows to perform a case-insensitive match (uppercase));
>        2) the option -key()- specifies the keywords you are looking for (in this case represented by regular expressions);
>        3) the option -new()- specifies the name of the new variable to be created (in this case, I called it "mark". Note the suboption -numeric- that allows to get the newly created variable as a numeric variable);
>        4) the option -recode()- specifies the user-defined coding scheme following the keywords order.
> 
> See -help screening- for more details.
> 
> Hope this helps.
> Federico
> 
> 
> On Oct 11, 2013, at 6:40 PM, STOLOWY, Herve wrote:
> 
>> Dear Statalisters:
>> 
>> Using Stata 12.1, I want to extract a portion of a string variable using
>> regular expressions, i.e. -regexs- and -regexm-.
>> 
>> My string variable has different possible values. Example:
>> 
>> A
>> A *
>> A *-
>> A *+
>> B
>> B *
>> B *-
>> B *+
>> etc.
>> 
>> I would like to get a variable with the content filled with the * or *- or
>> *+ or with this type of coding:
>> 
>> 0 if not star
>> 1 if only *
>> 2 if *-
>> 3 if *+
>> 
>> The * or *- or *+ always appear at the end on the value.
>> 
>> I tried the following syntax:
>> 
>> gen var_star =3D regexs(0) if(regexm(CurrRtg, "\*" "\*+" "\*-"))
>> 
>> Unfortunately, I get a * in all cases there is a * included in the value,
>> but I do not get the *- or *+.
>> 
>> I have difficulties with the syntax of -regexm-.
>> 
>> There is maybe another way to get the same result.
>> 
>> Best regards
>> 
>> Herve Stolowy
>> *
>> *   For searches and help try:
>> *   http://www.stata.com/help.cgi?search
>> *   http://www.stata.com/support/faqs/resources/statalist-faq/
>> *   http://www.ats.ucla.edu/stat/stata/
> 
> --
> Federico Belotti, PhD
> Research Fellow
> Centre for Economics and International Studies
> University of Rome Tor Vergata
> tel/fax: +39 06 7259 5627
> e-mail: [email protected]
> web: http://www.econometrics.it
> 
> 
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/faqs/resources/statalist-faq/
> *   http://www.ats.ucla.edu/stat/stata/
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/


*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index