Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: Manipulation of string variable using -regexm-


From   Nick Cox <njcoxstata@gmail.com>
To   "statalist@hsphsun2.harvard.edu" <statalist@hsphsun2.harvard.edu>
Subject   Re: st: Manipulation of string variable using -regexm-
Date   Sat, 12 Oct 2013 17:43:51 +0100

This corrects a typo (sorry). Note that the definition of the local
macro is essential for this to work, although it could be rewritten to
avoid that.

local end substr(CurrRtg, 2, .)
gen var_star = (`end' == "*") + 2 * (`end' == "*.") + 3 * (`end' =="*+")
assert `end' == "" if var_star == 0

Nick
njcoxstata@gmail.com


On 12 October 2013 14:46, STOLOWY, Herve <stolowy@hec.fr> wrote:
> Dear Nick:
>
> After
>
> gen var_star = (`end' == "*") + 2 * (`end' == "*.") + 3 * (`end' ="*+")
>
> I get an error message:
>
> unknown function ()
>
> Best regards
>
> Hervé
>
>
> On Sat, Oct 12, 2013 at 12:25 AM, Nick Cox <njcoxstata@gmail.com> wrote:
>> Note also other solutions such as
>>
>> local end substr(CurrRtg, 2, .)
>> gen var_star = (`end' == "*") + 2 * (`end' == "*.") + 3 * (`end' ="*+")
>> assert `end' == "" if var_star == 0
>>
>> Nick
>> njcoxstata@gmail.com
>>
>>
>> On 11 October 2013 21:59, Federico Belotti <f.belotti@gmail.com> wrote:
>>> Dear Herve
>>>
>>> my suggestion is to use the command -screening-, a Stata's user-written string variables exploring and recoding tool.
>>> You need to search and install it using
>>>
>>> findit screening
>>>
>>> Once installed, the syntax you are looking for to obtain a new numeric variable equal to 0 if not star, 1 if only *, 2 if *- and 3 if *+ is the following
>>>
>>> screening, source(CurrRtg, upper) key(end "\*" end "\*-" end "\*\+" end "[A-Z]") new(mark, numeric) recode(1 "1" 2 "2" 3 "3" 4 "0")
>>>
>>> where
>>>
>>>         1) the option -source()- specifies the source variable that have to be recoded (note the suboption -upper- which allows to perform a case-insensitive match (uppercase));
>>>         2) the option -key()- specifies the keywords you are looking for (in this case represented by regular expressions);
>>>         3) the option -new()- specifies the name of the new variable to be created (in this case, I called it "mark". Note the suboption -numeric- that allows to get the newly created variable as a numeric variable);
>>>         4) the option -recode()- specifies the user-defined coding scheme following the keywords order.
>>>
>>> See -help screening- for more details.
>>>
>>> Hope this helps.
>>> Federico
>>>
>>>
>>> On Oct 11, 2013, at 6:40 PM, STOLOWY, Herve wrote:
>>>
>>>> Dear Statalisters:
>>>>
>>>> Using Stata 12.1, I want to extract a portion of a string variable using
>>>> regular expressions, i.e. -regexs- and -regexm-.
>>>>
>>>> My string variable has different possible values. Example:
>>>>
>>>> A
>>>> A *
>>>> A *-
>>>> A *+
>>>> B
>>>> B *
>>>> B *-
>>>> B *+
>>>> etc.
>>>>
>>>> I would like to get a variable with the content filled with the * or *- or
>>>> *+ or with this type of coding:
>>>>
>>>> 0 if not star
>>>> 1 if only *
>>>> 2 if *-
>>>> 3 if *+
>>>>
>>>> The * or *- or *+ always appear at the end on the value.
>>>>
>>>> I tried the following syntax:
>>>>
>>>> gen var_star =3D regexs(0) if(regexm(CurrRtg, "\*" "\*+" "\*-"))
>>>>
>>>> Unfortunately, I get a * in all cases there is a * included in the value,
>>>> but I do not get the *- or *+.
>>>>
>>>> I have difficulties with the syntax of -regexm-.
>>>>
>>>> There is maybe another way to get the same result.
>>>>

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index