Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: st: Remove the middle part of a string variable


From   "Brent McSharry (ADHB)" <[email protected]>
To   "[email protected]" <[email protected]>
Subject   RE: st: Remove the middle part of a string variable
Date   Mon, 6 Jan 2014 13:47:44 +1300

Absolutely agree with using the generate rather than replace any time a regex is used. The regex supplied by Phil can possibly be improved on. I would suggest

gen testvar = regexs(1) + regexs(2) if regexm(myvar, "([^\.]*)\.?[^A-Z]*([A-Z]?)")

the query characters (?) meant that a match is generated for each example you supplied and so:
124->124 (rather than missing)
135.02=>135 (rather than missing)

The other difference in syntax is purely for compatability - programming languages and text editors support regular expressions, and the regular expression "(.*)\..*([A-Z])" works in Stata, but would require a negative lookahead assertion in most regex flavours - otherwise (.*) would capture everything up to the newline or end of string, including the period character. The hat (^) within square brackets says capture all characters which are not 

Brent McSharry MBBS BSc(med) FCICM(paed)
Paediatric Intensivist
Starship Children's Hospital
Private Bag 92024
Auckland 1142
New Zealand 
-----Original Message-----
From: [email protected] [mailto:[email protected]] On Behalf Of Nick Cox
Sent: Monday, 6 January 2014 1:30 p.m.
To: [email protected]
Subject: Re: st: Remove the middle part of a string variable

<>

Good, but don't -replace- in this situation. If the string extraction
is not what you want, or you won't to do something else to the
original variable later, you have no way of going back beyond reading
in the original data all again.
Nick
[email protected]


On 6 January 2014 00:27, Phil Clayton <[email protected]> wrote:
> Here's one solution using regular expressions:
> replace myvar=regexs(1) + regexs(2) if regexm(myvar, "(.*)\..*([A-Z])")
>
> Phil
>
> On 6 Jan 2014, at 11:13 am, manon <[email protected]> wrote:
>
>> Stata/IC 12.0 for Mac (64-bit Intel)
>> Revision 24 Aug 2011
>>
>> Dear all,
>>
>> I would like to remove the middle part of a string variable.
>>
>> I have a variable of the form:
>> 123.01A
>> 124
>> 135.02
>> 12.00B
>> 13.23K
>>
>> I want to remove the numbers between the "." and the letters.
>> In this example, I would want to get:
>> 123A
>> 124
>> 135.02
>> 12B
>> 13K
>>
>> Could you please help me?
>> Thanks in advance,
>>
>> Manon
>>
>>
>>
>> --
>> View this message in context: http://statalist.1588530.n2.nabble.com/Remove-the-middle-part-of-a-string-variable-tp7580472.html
>> Sent from the Statalist mailing list archive at Nabble.com.
>> *
>> *   For searches and help try:
>> *   http://www.stata.com/help.cgi?search
>> *   http://www.stata.com/support/faqs/resources/statalist-faq/
>> *   http://www.ats.ucla.edu/stat/stata/
>
>
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/faqs/resources/statalist-faq/
> *   http://www.ats.ucla.edu/stat/stata/
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index