Notice: On March 31, it was **announced** that Statalist is moving from an email list to a **forum**. The old list will shut down on April 23, and its replacement, **statalist.org** is already up and running.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

From |
Richard Herron <richard.c.herron@gmail.com> |

To |
statalist@hsphsun2.harvard.edu |

Subject |
Re: st: RE: Remove prefixes (e.g., >, <, and +/-) from numbers stored as strings |

Date |
Fri, 8 Jun 2012 17:13:48 -0400 |

Thanks, all! Good tips all around. I should go with the one-by-one substitution using -subinstr()- to make sure that I know what I'm doing. @Steve -- Thanks for the functioning regex. Something like this works and strips pre/postfixes. * code generate number2 = regexs(1) if regexm(combo, "^[^0-9]*([0-9]*\.?[0-9]*)[^0-9]*$") * In this case there _shouldn't be negative values (only +/- to indicate appoximate), but I should replace these characters one-by-one to be sure of what I'm doing. Richard Herron On Fri, Jun 8, 2012 at 3:04 PM, Steve Samuels <sjsamuels@gmail.com> wrote: > A regular expression solution that allows for characters other than > "> and %" at start and finish. > > Steve > sjsamuels@gmail.com > > **************** > clear > input str20 combo > ">88.27821" > "91.53401%" > " 76m " > " -31.20785" > ">-52.18793" > "39.94933%" > " +61" > " 89.47855" > " +75.43917" > ">82.67717" > "46.31095%" > " 81" > " 45.24185" > " 28.62701" > ">77.13605" > "46.79793%" > " 62" > " 19.50868" > " 91.54968" > " 86.64407" > end > replace combo = trim(combo) > des > gen new1 =regexs(2) /// > if regexm(combo,"^([^0-9+-]?)((\+|\-)?[0-9]+\.?[0-9]+)([^0-9]?)$") > destring new1,replace > list > ******************************************************************** > > On Jun 8, 2012, at 2:40 PM, Nick Cox wrote: > > Cox's Third Law of string processing is "regex machinery is great, but > always check first if something simpler will work directly". > > I really wouldn't want to support removing + and - characters > separately. You could be removing genuine information! > > If the issue is solely the composite prefix, then > > subinstr(myvariable, "+/-", "", 1) > > is as direct as anything else for pre-processing. If need be you can > of course insist that the prefix must be a prefix > > ... if substr(myvariable, 1, 3) == "+/-" > > The single character is char(177) in my flavour of Stata. Try > -asciiplot- (SSC) to see if yours agrees > > subinstr(myvariable, char(177), "", .) > > is what I would try. > > I like -destring- too. > > Nick > > On Fri, Jun 8, 2012 at 7:12 PM, Richard Herron > <richard.c.herron@gmail.com> wrote: > >> Thanks, David! That's big. I hadn't noticed the -ignore()- option in -destring-. >> >> But what if I don't know the set of possible prefixes? I guess >> -destring- will throw an error and I iteratively improve my filter? >> >> I have some where +/- is almost like a LaTeX \pm symbol where the + is >> stacked on the -. I think this is unicode U+00B1. >> http://www.fileformat.info/info/unicode/char/b1/index.htm >> >> Can I use -destring- to -ignore()- these? > >> On Fri, Jun 8, 2012 at 1:59 PM, David Radwin <dradwin@mprinc.com> wrote: >>> Can you use -destring- with the -ignore- option like this? >>> >>> . destring myvariable, ignore("+/-<>") generate(myvariable2) >>> >>> David >>> -- >>> David Radwin >>> Senior Research Associate >>> MPR Associates, Inc. >>> 2150 Shattuck Ave., Suite 800 >>> Berkeley, CA 94704 >>> Phone: 510-849-4942 >>> Fax: 510-849-0794 >>> >>> www.mprinc.com >>> >>> >>>> -----Original Message----- >>>> From: owner-statalist@hsphsun2.harvard.edu [mailto:owner- >>>> statalist@hsphsun2.harvard.edu] On Behalf Of Richard Herron >>>> Sent: Friday, June 08, 2012 10:30 AM >>>> To: statalist@hsphsun2.harvard.edu >>>> Subject: st: Remove prefixes (e.g., >, <, and +/-) from numbers stored as >>>> strings >>>> >>>> I have numbers stored as string with prefixes (e.g., "+/-30") that I >>>> would like to convert to numbers. Not all entries necessarily have >>>> prefixes (or postfixes). >>>> >>>> With -regexm()- and -regexs()- I can remove from postfixes and handle >>>> decimals, but I can't remove prefixes. Can you spot my error with >>>> -regexm()-? Thanks! >>>> >>>> Richard Herron >>>> >>>> * begin code >>>> clear >>>> set obs 20 >>>> generate number = 100*runiform() >>>> generate prefix = "" >>>> generate postfix = "" >>>> foreach i of numlist 1 5 10 15 { >>>> replace prefix = ">" in `i' >>>> replace postfix = "%" in `=`i' + 1' >>>> replace number = int(number) in `=`i' + 2' >>>> } >>>> egen combo = concat(prefix number postfix) >>>> generate number2 = regexs(1) if regexm(combo, "([0-9]*\.?[0-9]*)") >>>> list >>>> * end code >>> * >>> * For searches and help try: >>> * http://www.stata.com/help.cgi?search >>> * http://www.stata.com/support/statalist/faq >>> * http://www.ats.ucla.edu/stat/stata/ >> >> * >> * For searches and help try: >> * http://www.stata.com/help.cgi?search >> * http://www.stata.com/support/statalist/faq >> * http://www.ats.ucla.edu/stat/stata/ > > * > * For searches and help try: > * http://www.stata.com/help.cgi?search > * http://www.stata.com/support/statalist/faq > * http://www.ats.ucla.edu/stat/stata/ > > > * > * For searches and help try: > * http://www.stata.com/help.cgi?search > * http://www.stata.com/support/statalist/faq > * http://www.ats.ucla.edu/stat/stata/ * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

**References**:**st: Remove prefixes (e.g., >, <, and +/-) from numbers stored as strings***From:*Richard Herron <richard.c.herron@gmail.com>

**st: RE: Remove prefixes (e.g., >, <, and +/-) from numbers stored as strings***From:*"David Radwin" <dradwin@mprinc.com>

**Re: st: RE: Remove prefixes (e.g., >, <, and +/-) from numbers stored as strings***From:*Richard Herron <richard.c.herron@gmail.com>

**Re: st: RE: Remove prefixes (e.g., >, <, and +/-) from numbers stored as strings***From:*Nick Cox <njcoxstata@gmail.com>

**Re: st: RE: Remove prefixes (e.g., >, <, and +/-) from numbers stored as strings***From:*Steve Samuels <sjsamuels@gmail.com>

- Prev by Date:
**Re: st: RE: Remove prefixes (e.g., >, <, and +/-) from numbers stored as strings** - Next by Date:
**Re: st: additional lines in a plot** - Previous by thread:
**Re: st: RE: Remove prefixes (e.g., >, <, and +/-) from numbers stored as strings** - Next by thread:
**st: additional lines in a plot** - Index(es):