Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: Manipulation of string variable using -regexm-


From   Nick Cox <[email protected]>
To   "[email protected]" <[email protected]>
Subject   Re: st: Manipulation of string variable using -regexm-
Date   Sun, 13 Oct 2013 11:41:09 +0100

You have the data and I don't, so all I can do is suggest ideas that
you may need to modify.

Do you have often (always) have a space as second character? If so,
you may need to modify the code to select from position 3 onwards, or
to select using negative positions.


Nick
[email protected]


On 12 October 2013 23:18, STOLOWY, Herve <[email protected]> wrote:
> Dear Nick:
>
> I am really sorry but I get an error message after the last line:
>
> assert `end' == "" if var_star == 0
> 50880 contradictions in 50991 observations
> assertion is false
> r(9);
>
> Best regards
>
> Hervé
>
> On Sat, Oct 12, 2013 at 6:43 PM, Nick Cox <[email protected]> wrote:
>> This corrects a typo (sorry). Note that the definition of the local
>> macro is essential for this to work, although it could be rewritten to
>> avoid that.
>>
>> local end substr(CurrRtg, 2, .)
>> gen var_star = (`end' == "*") + 2 * (`end' == "*.") + 3 * (`end' =="*+")
>> assert `end' == "" if var_star == 0
>>
>> Nick
>> [email protected]
>>
>>
>> On 12 October 2013 14:46, STOLOWY, Herve <[email protected]> wrote:
>>> Dear Nick:
>>>
>>> After
>>>
>>> gen var_star = (`end' == "*") + 2 * (`end' == "*.") + 3 * (`end' ="*+")
>>>
>>> I get an error message:
>>>
>>> unknown function ()
>>>
>>> Best regards
>>>
>>> Hervé
>>>
>>>
>>> On Sat, Oct 12, 2013 at 12:25 AM, Nick Cox <[email protected]> wrote:
>>>> Note also other solutions such as
>>>>
>>>> local end substr(CurrRtg, 2, .)
>>>> gen var_star = (`end' == "*") + 2 * (`end' == "*.") + 3 * (`end' ="*+")
>>>> assert `end' == "" if var_star == 0
>>>>
>>>> Nick
>>>> [email protected]
>>>>
>>>>
>>>> On 11 October 2013 21:59, Federico Belotti <[email protected]> wrote:
>>>>> Dear Herve
>>>>>
>>>>> my suggestion is to use the command -screening-, a Stata's user-written string variables exploring and recoding tool.
>>>>> You need to search and install it using
>>>>>
>>>>> findit screening
>>>>>
>>>>> Once installed, the syntax you are looking for to obtain a new numeric variable equal to 0 if not star, 1 if only *, 2 if *- and 3 if *+ is the following
>>>>>
>>>>> screening, source(CurrRtg, upper) key(end "\*" end "\*-" end "\*\+" end "[A-Z]") new(mark, numeric) recode(1 "1" 2 "2" 3 "3" 4 "0")
>>>>>
>>>>> where
>>>>>
>>>>>         1) the option -source()- specifies the source variable that have to be recoded (note the suboption -upper- which allows to perform a case-insensitive match (uppercase));
>>>>>         2) the option -key()- specifies the keywords you are looking for (in this case represented by regular expressions);
>>>>>         3) the option -new()- specifies the name of the new variable to be created (in this case, I called it "mark". Note the suboption -numeric- that allows to get the newly created variable as a numeric variable);
>>>>>         4) the option -recode()- specifies the user-defined coding scheme following the keywords order.
>>>>>
>>>>> See -help screening- for more details.
>>>>>
>>>>> Hope this helps.
>>>>> Federico
>>>>>
>>>>>
>>>>> On Oct 11, 2013, at 6:40 PM, STOLOWY, Herve wrote:
>>>>>
>>>>>> Dear Statalisters:
>>>>>>
>>>>>> Using Stata 12.1, I want to extract a portion of a string variable using
>>>>>> regular expressions, i.e. -regexs- and -regexm-.
>>>>>>
>>>>>> My string variable has different possible values. Example:
>>>>>>
>>>>>> A
>>>>>> A *
>>>>>> A *-
>>>>>> A *+
>>>>>> B
>>>>>> B *
>>>>>> B *-
>>>>>> B *+
>>>>>> etc.
>>>>>>
>>>>>> I would like to get a variable with the content filled with the * or *- or
>>>>>> *+ or with this type of coding:
>>>>>>
>>>>>> 0 if not star
>>>>>> 1 if only *
>>>>>> 2 if *-
>>>>>> 3 if *+
>>>>>>
>>>>>> The * or *- or *+ always appear at the end on the value.
>>>>>>
>>>>>> I tried the following syntax:
>>>>>>
>>>>>> gen var_star =3D regexs(0) if(regexm(CurrRtg, "\*" "\*+" "\*-"))
>>>>>>
>>>>>> Unfortunately, I get a * in all cases there is a * included in the value,
>>>>>> but I do not get the *- or *+.
>>>>>>
>>>>>> I have difficulties with the syntax of -regexm-.
>>>>>>
>>>>>> There is maybe another way to get the same result.
>>>>>>
>>
>> *
>> *   For searches and help try:
>> *   http://www.stata.com/help.cgi?search
>> *   http://www.stata.com/support/faqs/resources/statalist-faq/
>> *   http://www.ats.ucla.edu/stat/stata/
>
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/faqs/resources/statalist-faq/
> *   http://www.ats.ucla.edu/stat/stata/

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index