Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: Extract a letter between numbers


From   Patrick McNamara <patrick.mcnamara@efficiency20.com>
To   statalist@hsphsun2.harvard.edu
Subject   Re: st: Extract a letter between numbers
Date   Mon, 29 Nov 2010 13:43:12 -0500

Thanks for the help on this guys :)  I ended up using a combination of
concatenate, reverse and using [0-9] [A-Z], depending on what the
error was. I just gotta work on writing the scripts so they'll be
usable for others/in the future. I'm not good at writing them quickly
like the rest of you guys :)

On Mon, Nov 22, 2010 at 4:59 PM, Dimitriy V. Masterov
<dvmaster@gmail.com> wrote:
>
> This does not work perfectly, but it illustrates the principles
> involved. It may work a lot better if you standardize some things,
> like change North to No (that is the one example where it fails
> below). The code uses seconde and dsconcat which you will need to
> install from scc.
>
>
> *********************************************************
> /* We need some fake dirty data, you have your own */
> clear
> set obs 6
> gen     add=""
> replace add = "22v46 Kim Dr." in 1
> replace add = "123d Main St." in 2
> replace add = "Crazy69 Blvd." in 3
> replace add = "-5x9 X Blvd." in 4
> replace add = "Too Dirty to Clean" in 5
> replace add = "56e54 Oak st Chicago, Illinois" in 6
>
> list
> save dirty.dta, replace
>
> /* Make some fake clean data */
> clear
> set obs 7
> gen add=""
> replace add = "2245 Kim Dr." in 1
> replace add = "123 Main St." in 2
> replace add = "59 X Blvd." in 3
> replace add = "Clean, but unused" in 4
> replace add = "5654 North Oak Chicago Illinois" in 5
> replace add = "5654 No. Oak St" in 6
> replace add = "5654 Oak St" in 7
> list
> save clean.dta, replace
> clear
>
> /* This part appends the 2 data sets, dsconcat is from scc */
> dsconcat dirty.dta clean.dta, append dsn(datafrom)
>
> /* match is like a group id, may need to adjust threshold for match
> sensitivity */
> strgroup add, gen(match) threshold(.35)
> sort match
>
>
> /* secode is from scc, turns the string variable into numeric */
> sencode datafrom, replace
> reshape wide add, i(match) j(datafrom)
> **************************************************************
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/statalist/faq
> *   http://www.ats.ucla.edu/stat/stata/



--
___________________________
Patrick McNamara
Manager, Program Logistics

Efficiency 2.0
165 William Street, Floor 10
New York, NY 10038
T. 646 478 8509
M. 816 305 5679
F. 347 328 9342

patrick.mcnamara@efficiency20.com
efficiency20.com

This electronic message originates from Efficiency 2.0, LLC. The
information contained in this message may be legally privileged and
confidential under applicable law. If you are not the intended
recipient you are hereby notified that any dissemination, copy or
disclosure of this communication is strictly prohibited. If you have
received this communication in error, please notify the sender and
purge the communication immediately without making any copy or
distribution.

Please consider the environment before printing this email.

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index