Stata: Data Analysis and Statistical Software

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: Extract a letter between numbers

From	Patrick McNamara <[email protected]>
To	[email protected]
Subject	Re: st: Extract a letter between numbers
Date	Mon, 22 Nov 2010 16:21:31 -0500

Those both sound like good ideas. Any advice on how to execute them
after install? :)

To give an idea of what I'm working with, I've listed a correct
address and some examples of address problems below:

5654 N Oak St Chicago, Illinois
56e54 Oak st Chicago, Illinois
5654 North Oak Chicago Illinois
5654 No. Oak St
5654 Oak St

There may be more than one of these issues present in a single address
entry. What I'm trying to do right now is find the length of the first
three words after the home address (5654), then use the longest and
2nd longest to see which has a better matching rate. But nearmrg or
strgroup may work much better.

Patrick

On Mon, Nov 22, 2010 at 3:41 PM, Dimitriy V. Masterov
<[email protected]> wrote:
> I think you may want to fuzzy merge your dirty address data and your
> clean data using nearmrg, which you can get from scc.
>
> An alternative way would to append your two data sets and then use
> strgroup on the variable that is the stacked version of your clean and
> dirty addresses. That will give you the closest match.
>
> Neither one will be perfect and may take a long time/fail if you have
> too much data. The latter approach has some operating system
> restrictions as well.
>
> DVM
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/statalist/faq
> *   http://www.ats.ucla.edu/stat/stata/
>

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

Follow-Ups:
- Re: st: Extract a letter between numbers
  - From: "Dimitriy V. Masterov" <[email protected]>

References:
- st: Extract a letter between numbers
  - From: Patrick McNamara <[email protected]>
- Re: st: Extract a letter between numbers
  - From: Eric Booth <[email protected]>
- RE: st: Extract a letter between numbers
  - From: Nick Cox <[email protected]>
- Re: st: Extract a letter between numbers
  - From: Patrick McNamara <[email protected]>
- Re: st: Extract a letter between numbers
  - From: "Dimitriy V. Masterov" <[email protected]>

Prev by Date: Re: st: Constructing a variable from standard deviations
Next by Date: re: Re: st: Constructing a variable from standard deviations
Previous by thread: Re: st: Extract a letter between numbers
Next by thread: Re: st: Extract a letter between numbers
Index(es):
- Date
- Thread