Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: Extract a letter between numbers


From   "Dimitriy V. Masterov" <[email protected]>
To   [email protected]
Subject   Re: st: Extract a letter between numbers
Date   Mon, 22 Nov 2010 16:59:00 -0500

This does not work perfectly, but it illustrates the principles
involved. It may work a lot better if you standardize some things,
like change North to No (that is the one example where it fails
below). The code uses seconde and dsconcat which you will need to
install from scc.


*********************************************************
/* We need some fake dirty data, you have your own */
clear
set obs 6
gen	add=""
replace add = "22v46 Kim Dr." in 1
replace add = "123d Main St." in 2
replace add = "Crazy69 Blvd." in 3
replace add = "-5x9 X Blvd." in 4
replace add = "Too Dirty to Clean" in 5
replace add = "56e54 Oak st Chicago, Illinois" in 6

list
save dirty.dta, replace

/* Make some fake clean data */
clear
set obs 7
gen add=""
replace add = "2245 Kim Dr." in 1
replace add = "123 Main St." in 2
replace add = "59 X Blvd." in 3
replace add = "Clean, but unused" in 4
replace add = "5654 North Oak Chicago Illinois" in 5
replace add = "5654 No. Oak St" in 6
replace add = "5654 Oak St" in 7
list
save clean.dta, replace
clear

/* This part appends the 2 data sets, dsconcat is from scc */
dsconcat dirty.dta clean.dta, append dsn(datafrom)

/* match is like a group id, may need to adjust threshold for match
sensitivity */
strgroup add, gen(match) threshold(.35)
sort match


/* secode is from scc, turns the string variable into numeric */
sencode datafrom, replace
reshape wide add, i(match) j(datafrom)
**************************************************************
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index