Bookmark and Share

Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down on April 23, and its replacement, is already up and running.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: matching strings on words

From   Jeph Herrin <>
Subject   Re: st: matching strings on words
Date   Wed, 31 Mar 2010 14:43:41 -0400

Thanks to Simon and Eric for their suggestions, I tried both
but in the end decided it would be faster to sort out the 60
mismatches manually.


Simon wrote:
I've had a similar issue:


On 30/03/2010 20:00, Jeph Herrin wrote:

I'm not sure what to call this - if I did, I might have
better luck with my searches for a utility. Basically,
I want to do something similar to the utility -nmatch-
which matches first and last names, but I have more than
two words per record.

The problem: I have two files with lists of hospital names.
Hospital names tend to consist of multiple words, that get
used to different extent; the same hospital might be listed

st joseph's
st joseph's memorial
st joseph's memorial hospital
st joseph's memorial hospital of danbury

etc. (There is also a lot variation on eg "Saint vs "St." and
"Memorial" vs "memorial", but I have trapped most of those

What I'd like to do is match these on "words", and generate
a _merge variable which indicates how many words match vs
how many words there are. Then I (or some unlucky grad student)
can trawl through the matches and decide which ones are the
same hospital.

I can see how to write a program to do such a thing, but am hoping
there is already a solution out there that I overlooked?


* For searches and help try:
*   For searches and help try:

*   For searches and help try:

© Copyright 1996–2015 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   Site index