Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
From | "Brent McSharry (ADHB)" <BrentM@adhb.govt.nz> |
To | "statalist@hsphsun2.harvard.edu" <statalist@hsphsun2.harvard.edu> |
Subject | RE: st: Remove the middle part of a string variable |
Date | Mon, 6 Jan 2014 13:47:44 +1300 |
Absolutely agree with using the generate rather than replace any time a regex is used. The regex supplied by Phil can possibly be improved on. I would suggest gen testvar = regexs(1) + regexs(2) if regexm(myvar, "([^\.]*)\.?[^A-Z]*([A-Z]?)") the query characters (?) meant that a match is generated for each example you supplied and so: 124->124 (rather than missing) 135.02=>135 (rather than missing) The other difference in syntax is purely for compatability - programming languages and text editors support regular expressions, and the regular expression "(.*)\..*([A-Z])" works in Stata, but would require a negative lookahead assertion in most regex flavours - otherwise (.*) would capture everything up to the newline or end of string, including the period character. The hat (^) within square brackets says capture all characters which are not Brent McSharry MBBS BSc(med) FCICM(paed) Paediatric Intensivist Starship Children's Hospital Private Bag 92024 Auckland 1142 New Zealand -----Original Message----- From: owner-statalist@hsphsun2.harvard.edu [mailto:owner-statalist@hsphsun2.harvard.edu] On Behalf Of Nick Cox Sent: Monday, 6 January 2014 1:30 p.m. To: statalist@hsphsun2.harvard.edu Subject: Re: st: Remove the middle part of a string variable <> Good, but don't -replace- in this situation. If the string extraction is not what you want, or you won't to do something else to the original variable later, you have no way of going back beyond reading in the original data all again. Nick njcoxstata@gmail.com On 6 January 2014 00:27, Phil Clayton <philclayton@internode.on.net> wrote: > Here's one solution using regular expressions: > replace myvar=regexs(1) + regexs(2) if regexm(myvar, "(.*)\..*([A-Z])") > > Phil > > On 6 Jan 2014, at 11:13 am, manon <manon.costinot@gmail.com> wrote: > >> Stata/IC 12.0 for Mac (64-bit Intel) >> Revision 24 Aug 2011 >> >> Dear all, >> >> I would like to remove the middle part of a string variable. >> >> I have a variable of the form: >> 123.01A >> 124 >> 135.02 >> 12.00B >> 13.23K >> >> I want to remove the numbers between the "." and the letters. >> In this example, I would want to get: >> 123A >> 124 >> 135.02 >> 12B >> 13K >> >> Could you please help me? >> Thanks in advance, >> >> Manon >> >> >> >> -- >> View this message in context: http://statalist.1588530.n2.nabble.com/Remove-the-middle-part-of-a-string-variable-tp7580472.html >> Sent from the Statalist mailing list archive at Nabble.com. >> * >> * For searches and help try: >> * http://www.stata.com/help.cgi?search >> * http://www.stata.com/support/faqs/resources/statalist-faq/ >> * http://www.ats.ucla.edu/stat/stata/ > > > * > * For searches and help try: > * http://www.stata.com/help.cgi?search > * http://www.stata.com/support/faqs/resources/statalist-faq/ > * http://www.ats.ucla.edu/stat/stata/ * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/faqs/resources/statalist-faq/ * http://www.ats.ucla.edu/stat/stata/ * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/faqs/resources/statalist-faq/ * http://www.ats.ucla.edu/stat/stata/