Stata The Stata listserver
[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

st: RE: Re: Splitting a string variable

From   "Frank de Libero" <>
To   <>
Subject   st: RE: Re: Splitting a string variable
Date   Tue, 6 Sep 2005 13:57:27 -0700

The loop isn't necessary. Using Stata's capabilites, the following

replace id = regexs(2) if regexm(id,"^(0)+(.+)")

or, making the distinction between [] and () in regular expressions,

replace id = regexs(1) if regexm(id,"^[0]+(.+)")

BTW, Kevin developed the Stata implementation of the three regular
expression functions in version 9 and did a really nice job.


-----Original Message-----
[] On Behalf Of Kevin Turner
Sent: Tuesday, September 06, 2005 11:31 AM
Subject: st: Re: Splitting a string variable

Raphael Fraser ( writes:

>I have a string variable of the type listed below:
>How do I remove the leading zeros from this variable? I tried using
>the -split- command, but it removed both leading and trailing zeros.
>The end result should look like this:

The presence of sporadic letters and trailing zeros causes problems, but
solution is one that the new regular expression functions of Stata are
adapted to solving.  The solution is a loop over the observations, using
initial regular expression function to test for a match, and if so, the
corresponding regular expression function to pull the subexpression that
matches the non-leading-zero portion of the string.

local obs = _N
forvalues x = 1(1)`obs' {
	if (regexm(id[`x'], "^[0]+(.+)")) {
		replace id = regexs(1) in `x'	/* grab first sub
expression */

A few comments on regular expression syntax:

1) The string "^[0]+(.+)" matches one or more leading zeros, and then
one or
   more characters till the end. 
2) ^ represents beginning of string
3) [] denotes a set of characters to match, in this case just zeros 
4) + denotes a 'one or more' match of the previous expression
5) () denote a subexpression
6) . will match any character

We also had to construct a loop over the observations because we needed
a pair
of function calls to operate on each individual observation.

Hope this helps!
*   For searches and help try:

*   For searches and help try:

© Copyright 1996–2017 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index