Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: Extract a letter between numbers

From   Eric Booth <[email protected]>
To   "<[email protected]>" <[email protected]>
Subject   Re: st: Extract a letter between numbers
Date   Mon, 22 Nov 2010 20:16:58 +0000


On Nov 22, 2010, at 11:59 AM, Nick Cox wrote:
> This complements mine in so far as I hinted that there might be an regex solution. But why assume that typos in the number field are limited to a-zA-Z? They might as well be almost anything! 
> Nick 

I hadn't seen Nick's posting when I posted, but Nick rightly points out that other characters could be an issue & so his is a better solution for safeguarding that only [0-9] makes it into the street number.   
I was checking for only alpha chars because that's what the OP described in the initial post ( I guess I assumed the online form the OP uses to capture the online data has some basic validation properties that prevents special characters (e.g., non-[0-9a-zA-Z] chars) from being entered ).  
I was trying to show how to get rid of the "e" in "12e3" using regular expression matching--which I'm not too experienced with, but I'm trying to learn--so, if someone has a solution using regular expressions to solve the OP's issue, I'd be interested in seeing it.

Below is a modification of my original example that gets closer using regex matching & Nick's -charlist- (from SSC)  however, it fails if there is an address where there are letters or special characters that are in several places spread throughout the numbers in the street number (e.g. "12a3@4c5").  That is, I'm curious about how to extract the "3" and "4" out of the middle of "12a3@4c5" using regular expressions.  There are ways to specify that the regular expression look at the beginning (^) or end ($) of a string, but how do I get things from the middle (or is there a better approach entirely)?

inp str40(address)
"12+3 Main St"
"1144Re=^&5 Oak St 77844"
"1a Broadway Ave., College Station, TX."
"11 Test St."
"12a3@4c5 Test St."
//install charlist from SSC//
cap which charlist
if _rc ssc install charlist, replace
//use charlist to grab special chars//
charlist address
local x `r(sepchars)'
numlist "0/9"
loc y `c(alpha)' `c(ALPHA)' `r(numlist)'
loc z:list local(x) -  local(y)
loc z:subinstr local z " " "", all
** **
g address2 = regexs(0) if /* 
*/ regexm(address, "^[0-9a-zA-Z/`z']*")
g begin =  regexs(0) if /* 
*/ regexm(address2, "^[0-9]*")
g end = regexs(0) if /* 
*/ regexm(address2, "[0-9]*$") /*
*/ & address2!=begin
//put it together//
g newaddress = begin + end
li newaddress address 

- Eric
Eric A. Booth
Public Policy Research Institute
Texas A&M University
[email protected]
Office: +979.845.6754

*   For searches and help try:

© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index