Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: Extract a letter between numbers

From   Patrick McNamara <>
Subject   Re: st: Extract a letter between numbers
Date   Mon, 22 Nov 2010 16:21:31 -0500

Those both sound like good ideas. Any advice on how to execute them
after install? :)

To give an idea of what I'm working with, I've listed a correct
address and some examples of address problems below:

5654 N Oak St Chicago, Illinois
56e54 Oak st Chicago, Illinois
5654 North Oak Chicago Illinois
5654 No. Oak St
5654 Oak St

There may be more than one of these issues present in a single address
entry. What I'm trying to do right now is find the length of the first
three words after the home address (5654), then use the longest and
2nd longest to see which has a better matching rate. But nearmrg or
strgroup may work much better.


On Mon, Nov 22, 2010 at 3:41 PM, Dimitriy V. Masterov
<> wrote:
> I think you may want to fuzzy merge your dirty address data and your
> clean data using nearmrg, which you can get from scc.
> An alternative way would to append your two data sets and then use
> strgroup on the variable that is the stacked version of your clean and
> dirty addresses. That will give you the closest match.
> Neither one will be perfect and may take a long time/fail if you have
> too much data. The latter approach has some operating system
> restrictions as well.
> *
> *   For searches and help try:
> *
> *
> *

*   For searches and help try:

© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index