Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: st: Extract a letter between numbers


From   Nick Cox <[email protected]>
To   "'[email protected]'" <[email protected]>
Subject   RE: st: Extract a letter between numbers
Date   Mon, 22 Nov 2010 17:59:23 +0000

This complements mine in so far as I hinted that there might be an regex solution. But why assume that typos in the number field are limited to a-zA-Z? They might as well be almost anything! 

Nick 
[email protected] 

Eric Booth

Probably need to take a look at regular expression matching.  
Take a look at these links:

http://www.stata.com/support/faqs/data/regex.html
http://www.stata.com/meeting/wcsug07/medeiros_reg_ex.pdf

Here's a start:
********!
clear
inp str40(address)
"12e3 Main St"
"1144Re5 Oak St 77844"
"1a Broadway Ave., College Station, TX."
"11 Test St."
end

gen address2 = regexs(0) if /* 
*/ regexm(address, "^[0-9a-zA-Z]*")
destring address2, replace force ignore("`c(alpha)'`c(ALPHA)'")
li
********!

On Nov 22, 2010, at 11:07 AM, Patrick McNamara wrote:

> I'm new to stata coding (been using drop-down menus for a few years),
> and I'm working on an address parser to pull apart and put back
> together people's real address apart from the mess they enter online
> :) Right now I'm trying to figure out a way to take out any letters in
> between two numbers that people have accidentally typed into their
> house address field (i.e. for 123 Main St, they types 12e3 Main St).
> The letters are not in the same position and there are multiples. I've
> tried strpos() but it won't allow me to use a range [A-Z] or [0-9].
> Any help would be greatly appreciated!

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index