Stata The Stata listserver
[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

st: Re: Splitting a string variable


From   kturner@stata.com (Kevin Turner)
To   statalist@hsphsun2.harvard.edu
Subject   st: Re: Splitting a string variable
Date   Tue, 06 Sep 2005 13:30:31 -0500

Raphael Fraser (raphael.fraser@gmail.com) writes:

>I have a string variable of the type listed below:
>
>id
>0008
>0020
>016A
>0160C
>
>How do I remove the leading zeros from this variable? I tried using
>the -split- command, but it removed both leading and trailing zeros.
>The end result should look like this:
>
>id
>8
>20
>16A
>160C

The presence of sporadic letters and trailing zeros causes problems, but the
solution is one that the new regular expression functions of Stata are easily
adapted to solving.  The solution is a loop over the observations, using an
initial regular expression function to test for a match, and if so, the
corresponding regular expression function to pull the subexpression that
matches the non-leading-zero portion of the string.

local obs = _N
forvalues x = 1(1)`obs' {
	if (regexm(id[`x'], "^[0]+(.+)")) {
		replace id = regexs(1) in `x'	/* grab first sub expression */
	} 
}

A few comments on regular expression syntax:

1) The string "^[0]+(.+)" matches one or more leading zeros, and then one or
   more characters till the end. 
2) ^ represents beginning of string
3) [] denotes a set of characters to match, in this case just zeros 
4) + denotes a 'one or more' match of the previous expression
5) () denote a subexpression
6) . will match any character

We also had to construct a loop over the observations because we needed a pair
of function calls to operate on each individual observation.

Hope this helps!
--Kevin 
*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index