Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: st: Extract a letter between numbers

From   Nick Cox <[email protected]>
To   "'[email protected]'" <[email protected]>
Subject   RE: st: Extract a letter between numbers
Date   Mon, 22 Nov 2010 17:59:23 +0000

This complements mine in so far as I hinted that there might be an regex solution. But why assume that typos in the number field are limited to a-zA-Z? They might as well be almost anything! 

[email protected] 

Eric Booth

Probably need to take a look at regular expression matching.  
Take a look at these links:

Here's a start:
inp str40(address)
"12e3 Main St"
"1144Re5 Oak St 77844"
"1a Broadway Ave., College Station, TX."
"11 Test St."

gen address2 = regexs(0) if /* 
*/ regexm(address, "^[0-9a-zA-Z]*")
destring address2, replace force ignore("`c(alpha)'`c(ALPHA)'")

On Nov 22, 2010, at 11:07 AM, Patrick McNamara wrote:

> I'm new to stata coding (been using drop-down menus for a few years),
> and I'm working on an address parser to pull apart and put back
> together people's real address apart from the mess they enter online
> :) Right now I'm trying to figure out a way to take out any letters in
> between two numbers that people have accidentally typed into their
> house address field (i.e. for 123 Main St, they types 12e3 Main St).
> The letters are not in the same position and there are multiples. I've
> tried strpos() but it won't allow me to use a range [A-Z] or [0-9].
> Any help would be greatly appreciated!

*   For searches and help try:

© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index