Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
RE: st: Remove the middle part of a string variable
From
"Brent McSharry (ADHB)" <[email protected]>
To
"[email protected]" <[email protected]>
Subject
RE: st: Remove the middle part of a string variable
Date
Mon, 6 Jan 2014 13:47:44 +1300
Absolutely agree with using the generate rather than replace any time a regex is used. The regex supplied by Phil can possibly be improved on. I would suggest
gen testvar = regexs(1) + regexs(2) if regexm(myvar, "([^\.]*)\.?[^A-Z]*([A-Z]?)")
the query characters (?) meant that a match is generated for each example you supplied and so:
124->124 (rather than missing)
135.02=>135 (rather than missing)
The other difference in syntax is purely for compatability - programming languages and text editors support regular expressions, and the regular expression "(.*)\..*([A-Z])" works in Stata, but would require a negative lookahead assertion in most regex flavours - otherwise (.*) would capture everything up to the newline or end of string, including the period character. The hat (^) within square brackets says capture all characters which are not
Brent McSharry MBBS BSc(med) FCICM(paed)
Paediatric Intensivist
Starship Children's Hospital
Private Bag 92024
Auckland 1142
New Zealand
-----Original Message-----
From: [email protected] [mailto:[email protected]] On Behalf Of Nick Cox
Sent: Monday, 6 January 2014 1:30 p.m.
To: [email protected]
Subject: Re: st: Remove the middle part of a string variable
<>
Good, but don't -replace- in this situation. If the string extraction
is not what you want, or you won't to do something else to the
original variable later, you have no way of going back beyond reading
in the original data all again.
Nick
[email protected]
On 6 January 2014 00:27, Phil Clayton <[email protected]> wrote:
> Here's one solution using regular expressions:
> replace myvar=regexs(1) + regexs(2) if regexm(myvar, "(.*)\..*([A-Z])")
>
> Phil
>
> On 6 Jan 2014, at 11:13 am, manon <[email protected]> wrote:
>
>> Stata/IC 12.0 for Mac (64-bit Intel)
>> Revision 24 Aug 2011
>>
>> Dear all,
>>
>> I would like to remove the middle part of a string variable.
>>
>> I have a variable of the form:
>> 123.01A
>> 124
>> 135.02
>> 12.00B
>> 13.23K
>>
>> I want to remove the numbers between the "." and the letters.
>> In this example, I would want to get:
>> 123A
>> 124
>> 135.02
>> 12B
>> 13K
>>
>> Could you please help me?
>> Thanks in advance,
>>
>> Manon
>>
>>
>>
>> --
>> View this message in context: http://statalist.1588530.n2.nabble.com/Remove-the-middle-part-of-a-string-variable-tp7580472.html
>> Sent from the Statalist mailing list archive at Nabble.com.
>> *
>> * For searches and help try:
>> * http://www.stata.com/help.cgi?search
>> * http://www.stata.com/support/faqs/resources/statalist-faq/
>> * http://www.ats.ucla.edu/stat/stata/
>
>
> *
> * For searches and help try:
> * http://www.stata.com/help.cgi?search
> * http://www.stata.com/support/faqs/resources/statalist-faq/
> * http://www.ats.ucla.edu/stat/stata/
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/faqs/resources/statalist-faq/
* http://www.ats.ucla.edu/stat/stata/
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/faqs/resources/statalist-faq/
* http://www.ats.ucla.edu/stat/stata/