Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: Extract a letter between numbers

From   Patrick McNamara <>
Subject   Re: st: Extract a letter between numbers
Date   Mon, 22 Nov 2010 14:54:36 -0500

That's a good point, as I have a few who put hyphens in place of
numbers. To be honest, I'm not sure I understand how to implement the
answers that either of you presented; I think I understated how new I
am to coding in stata :)  The more pressing issue for me may be
identifying where the actual street names start and end; being that
they can be letters or numbers. I've split the addresses out using the
basic split function, and now have up to 13 variables. The method
doesn't have to be perfect (meaning I can lose a few of the crazier
ones and it won't be a big deal), but the street address is usually
within three different variables.

To step back, the ultimate goal here is to match up street addresses
people put in on a website with the standardized versions in my
database, which have the house number, direction (N, NW, etc.), street
name, street suffix (st, st., ave, pl., etc.) as well as city, zip and
state (state is all Illinois). Any thoughts on this?

On Mon, Nov 22, 2010 at 12:59 PM, Nick Cox <> wrote:
> This complements mine in so far as I hinted that there might be an regex solution. But why assume that typos in the number field are limited to a-zA-Z? They might as well be almost anything!
> Nick
> Eric Booth
> Probably need to take a look at regular expression matching.
> Take a look at these links:
> Here's a start:
> ********!
> clear
> inp str40(address)
> "12e3 Main St"
> "1144Re5 Oak St 77844"
> "1a Broadway Ave., College Station, TX."
> "11 Test St."
> end
> gen address2 = regexs(0) if /*
> */ regexm(address, "^[0-9a-zA-Z]*")
> destring address2, replace force ignore("`c(alpha)'`c(ALPHA)'")
> li
> ********!
> On Nov 22, 2010, at 11:07 AM, Patrick McNamara wrote:
>> I'm new to stata coding (been using drop-down menus for a few years),
>> and I'm working on an address parser to pull apart and put back
>> together people's real address apart from the mess they enter online
>> :) Right now I'm trying to figure out a way to take out any letters in
>> between two numbers that people have accidentally typed into their
>> house address field (i.e. for 123 Main St, they types 12e3 Main St).
>> The letters are not in the same position and there are multiples. I've
>> tried strpos() but it won't allow me to use a range [A-Z] or [0-9].
>> Any help would be greatly appreciated!
> *
> *   For searches and help try:
> *
> *
> *

Patrick McNamara
Manager, Program Logistics

Efficiency 2.0
165 William Street, Floor 10
New York, NY 10038
T. 646 478 8509
M. 816 305 5679
F. 347 328 9342

This electronic message originates from Efficiency 2.0, LLC. The
information contained in this message may be legally privileged and
confidential under applicable law. If you are not the intended
recipient you are hereby notified that any dissemination, copy or
disclosure of this communication is strictly prohibited. If you have
received this communication in error, please notify the sender and
purge the communication immediately without making any copy or

Please consider the environment before printing this email.

*   For searches and help try:

© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index