Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: Manipulation of string variable using -regexm-


From   Roberto Ferrer <refp16@gmail.com>
To   Stata Help <statalist@hsphsun2.harvard.edu>
Subject   Re: st: Manipulation of string variable using -regexm-
Date   Sat, 12 Oct 2013 18:50:17 +0100

Just another way, similar to what was being attempted in the original post:

*---------------------------- input --------------------------------------------
clear

input str8 grade
"A"
"A *"
"A *-"
"A *+"
"B"
"B *"
"B *-"
"B *+"
end

*----------------------------- what you want -----------------------------------

* Method 1: Match and use everything after the first substring (i.e. literal)
generate str8 newgrade = regexs(2) if regexm(grade, "^([A-Z]?)(.*)" )

* Method 2: Replace only the "A-Z" literals with a blank.
generate str8 newgrade2 = ""
replace newgrade2 = trim(regexr(grade, "^[A-Z]?", ""))

*Both methods generate a new variable with a leading blank. Method 2 deletes
*it using -trim-.

*-------------------------------------------------------------------------------

On Sat, Oct 12, 2013 at 5:43 PM, Nick Cox <njcoxstata@gmail.com> wrote:
> This corrects a typo (sorry). Note that the definition of the local
> macro is essential for this to work, although it could be rewritten to
> avoid that.
>
> local end substr(CurrRtg, 2, .)
> gen var_star = (`end' == "*") + 2 * (`end' == "*.") + 3 * (`end' =="*+")
> assert `end' == "" if var_star == 0
>
> Nick
> njcoxstata@gmail.com
>
>
> On 12 October 2013 14:46, STOLOWY, Herve <stolowy@hec.fr> wrote:
>> Dear Nick:
>>
>> After
>>
>> gen var_star = (`end' == "*") + 2 * (`end' == "*.") + 3 * (`end' ="*+")
>>
>> I get an error message:
>>
>> unknown function ()
>>
>> Best regards
>>
>> Hervé
>>
>>
>> On Sat, Oct 12, 2013 at 12:25 AM, Nick Cox <njcoxstata@gmail.com> wrote:
>>> Note also other solutions such as
>>>
>>> local end substr(CurrRtg, 2, .)
>>> gen var_star = (`end' == "*") + 2 * (`end' == "*.") + 3 * (`end' ="*+")
>>> assert `end' == "" if var_star == 0
>>>
>>> Nick
>>> njcoxstata@gmail.com
>>>
>>>
>>> On 11 October 2013 21:59, Federico Belotti <f.belotti@gmail.com> wrote:
>>>> Dear Herve
>>>>
>>>> my suggestion is to use the command -screening-, a Stata's user-written string variables exploring and recoding tool.
>>>> You need to search and install it using
>>>>
>>>> findit screening
>>>>
>>>> Once installed, the syntax you are looking for to obtain a new numeric variable equal to 0 if not star, 1 if only *, 2 if *- and 3 if *+ is the following
>>>>
>>>> screening, source(CurrRtg, upper) key(end "\*" end "\*-" end "\*\+" end "[A-Z]") new(mark, numeric) recode(1 "1" 2 "2" 3 "3" 4 "0")
>>>>
>>>> where
>>>>
>>>>         1) the option -source()- specifies the source variable that have to be recoded (note the suboption -upper- which allows to perform a case-insensitive match (uppercase));
>>>>         2) the option -key()- specifies the keywords you are looking for (in this case represented by regular expressions);
>>>>         3) the option -new()- specifies the name of the new variable to be created (in this case, I called it "mark". Note the suboption -numeric- that allows to get the newly created variable as a numeric variable);
>>>>         4) the option -recode()- specifies the user-defined coding scheme following the keywords order.
>>>>
>>>> See -help screening- for more details.
>>>>
>>>> Hope this helps.
>>>> Federico
>>>>
>>>>
>>>> On Oct 11, 2013, at 6:40 PM, STOLOWY, Herve wrote:
>>>>
>>>>> Dear Statalisters:
>>>>>
>>>>> Using Stata 12.1, I want to extract a portion of a string variable using
>>>>> regular expressions, i.e. -regexs- and -regexm-.
>>>>>
>>>>> My string variable has different possible values. Example:
>>>>>
>>>>> A
>>>>> A *
>>>>> A *-
>>>>> A *+
>>>>> B
>>>>> B *
>>>>> B *-
>>>>> B *+
>>>>> etc.
>>>>>
>>>>> I would like to get a variable with the content filled with the * or *- or
>>>>> *+ or with this type of coding:
>>>>>
>>>>> 0 if not star
>>>>> 1 if only *
>>>>> 2 if *-
>>>>> 3 if *+
>>>>>
>>>>> The * or *- or *+ always appear at the end on the value.
>>>>>
>>>>> I tried the following syntax:
>>>>>
>>>>> gen var_star =3D regexs(0) if(regexm(CurrRtg, "\*" "\*+" "\*-"))
>>>>>
>>>>> Unfortunately, I get a * in all cases there is a * included in the value,
>>>>> but I do not get the *- or *+.
>>>>>
>>>>> I have difficulties with the syntax of -regexm-.
>>>>>
>>>>> There is maybe another way to get the same result.
>>>>>
>
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/faqs/resources/statalist-faq/
> *   http://www.ats.ucla.edu/stat/stata/

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index