Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

st: editing string variables to remove letters and keep only numbers

From   Michael McCulloch <>
Subject   st: editing string variables to remove letters and keep only numbers
Date   Mon, 17 Jun 2013 15:53:41 -0700

I have a variable in my dataset that (due to changes in data entry practices over time) contains several styles of the variable ID:

	- a number (e.g. 164)
	- a letter-number combination (e.g. e64)
	- a comma-separated letter-number combination (e.g. e64,e65) 

In seeking to (A) remove the letters, and (B) separate the comma-separated into two separate variables, ID1 and ID2, I wrote the following argument:

. split ID, p(",")
. gen str id1_new =""		// make new ID to separate out the "e" from ID
. replace id1_new=substr(id1,2,3) 

This successfully splits ID into ID1 and ID2.

This also works if: 
	a 3-digit variable has a preceding letter (e64 is changed to 64)
However, in the case of a 3-digit values WITHOUT PRECEDING LETTER, the first digit is removed (164 is changed to 64).

Any suggestions would be appreciated.

Best wishes,
Michael McCulloch, LAc MPH PhD

Pine Street Foundation, since 1989
124 Pine Street | San Anselmo | California | 94960-2674  
P: (415) 407-1357 | F: (206) 338-2391 |

*   For searches and help try:

© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index