[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

From |
"Frank de Libero" <fedmerchant@comcast.net> |

To |
<statalist@hsphsun2.harvard.edu> |

Subject |
st: RE: Re: Splitting a string variable |

Date |
Tue, 6 Sep 2005 13:57:27 -0700 |

The loop isn't necessary. Using Stata's capabilites, the following works: replace id = regexs(2) if regexm(id,"^(0)+(.+)") or, making the distinction between [] and () in regular expressions, replace id = regexs(1) if regexm(id,"^[0]+(.+)") BTW, Kevin developed the Stata implementation of the three regular expression functions in version 9 and did a really nice job. ..Frank -----Original Message----- From: owner-statalist@hsphsun2.harvard.edu [mailto:owner-statalist@hsphsun2.harvard.edu] On Behalf Of Kevin Turner Sent: Tuesday, September 06, 2005 11:31 AM To: statalist@hsphsun2.harvard.edu Subject: st: Re: Splitting a string variable Raphael Fraser (raphael.fraser@gmail.com) writes: >I have a string variable of the type listed below: > >id >0008 >0020 >016A >0160C > >How do I remove the leading zeros from this variable? I tried using >the -split- command, but it removed both leading and trailing zeros. >The end result should look like this: > >id >8 >20 >16A >160C The presence of sporadic letters and trailing zeros causes problems, but the solution is one that the new regular expression functions of Stata are easily adapted to solving. The solution is a loop over the observations, using an initial regular expression function to test for a match, and if so, the corresponding regular expression function to pull the subexpression that matches the non-leading-zero portion of the string. local obs = _N forvalues x = 1(1)`obs' { if (regexm(id[`x'], "^[0]+(.+)")) { replace id = regexs(1) in `x' /* grab first sub expression */ } } A few comments on regular expression syntax: 1) The string "^[0]+(.+)" matches one or more leading zeros, and then one or more characters till the end. 2) ^ represents beginning of string 3) [] denotes a set of characters to match, in this case just zeros 4) + denotes a 'one or more' match of the previous expression 5) () denote a subexpression 6) . will match any character We also had to construct a loop over the observations because we needed a pair of function calls to operate on each individual observation. Hope this helps! --Kevin * * For searches and help try: * http://www.stata.com/support/faqs/res/findit.html * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/ * * For searches and help try: * http://www.stata.com/support/faqs/res/findit.html * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/

**References**:**st: Re: Splitting a string variable***From:*kturner@stata.com (Kevin Turner)

- Prev by Date:
**RE: st: Splitting a string variable** - Next by Date:
**st: finding a mean with empty obs cells** - Previous by thread:
**st: Re: Splitting a string variable** - Next by thread:
**st: censored regressors** - Index(es):

© Copyright 1996–2016 StataCorp LP | Terms of use | Privacy | Contact us | What's new | Site index |