Statalist The Stata Listserver


[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

Re: st: Data management: code to be able to..


From   Phil Schumm <[email protected]>
To   [email protected]
Subject   Re: st: Data management: code to be able to..
Date   Sun, 3 Jun 2007 21:08:48 -0500

On Jun 3, 2007, at 3:09 PM, S J wrote:
I have a string identifier variable of the form:

id
"LOCALITY_NAMED_ABCD 001"
"LOCALITY_NAMED_F 060"
"HOUSTON 078"
"SAN ANTONIO 112"

The variable id thus has both the name of the locality in question (say, HOUSTON), and an identifying code (say, 078).

How can I generate a new variable, idcode, that only captures the numeric component of id, so that, I get, for the above 4 cases, the values below:

idcode
1
60
78
112

If you can assume that in all cases the ID code will be preceded by a space *and* has no embedded spaces itself (e.g., "CHICAGO 112 09" where the ID code is "112 09"), then the following will work:


gen idcode = substr(trim(id),-strpos(reverse(trim(id))," ")+1,.)


Note that this allows for the possibility that (1) the ID code is variable in length, and (2) the entire string has trailing spaces.

BTW, this is a situation where it would be nice to have a function that returned the nth subexpression from a regular expression match variable-wise (as opposed to the function -regexs()-, which, I believe, can only handle one observation at a time). For example, if there were such a function (call it -foo()-) and if we could assume that the ID code were always an integer, we could write something like the following:

gen idcode = foo(id,"^.+[ ]+([0-9]+)[ ]*",1)

where we are setting idcode equal to the first subexpression of the match. Although the first approach based on -substr()- is probably adequate in this case, the latter approach is -- for those used to working with regular expressions -- much more readable.


-- Phil

*
* For searches and help try:
* http://www.stata.com/support/faqs/res/findit.html
* http://www.stata.com/support/statalist/faq
* http://www.ats.ucla.edu/stat/stata/




© Copyright 1996–2024 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index