Stata The Stata listserver
[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

st: Near Matches


From   "Clyde Schechter" <cschecht@aecom.yu.edu>
To   statalist@hsphsun2.harvard.edu
Subject   st: Near Matches
Date   Sun, 30 Oct 2005 15:52:10 -0500 (EST)

Friends,

Michael Blasnik was kind enough to send me his bigram.ado--it turns out to
be exactly what I need.  I've tried it out by running it on the last three
request lists I've encountered, and in every case, it gave the highest (or
in one case second highest) match score to the "correct" answers.  Thanks,
Mike!

As for soundex, I already have it, but I cannot recall where I got it
from.  For some reason -search- and -findit- don't turn it up, but,
nevertheless, you can get it with -ssc install _gsoundex-.  It's a
marvelous tool for matching up person names.  But a couple of limitations:

It goes one word at a time.  So if you have a first and last name, you
need to soundex each of them and then match on both.  I'm not sure how
you'd go about using it when you want to try to match names with different
numbers of words (e.g. Ford with Ford Motor Company).

It does not work well with artificial vocabulary.  The algorithm was
designed to exploit the frequencies of certain letters in certain
positions in people names.  I once tried to use it to match potentially
misspelled generic and brand drug names with their correct counterparts
and had no luck at all.

Corporation names are probably closer to people names than drug names are,
but I suspect that soundex will be less than ideal for the purpose.  (Ford
is obviously taken from a person name, but Exxon, Verizon, or similar
names of recent vintage have very different spelling patterns.)  I'd be
curious how it actually works out in this context.

I have not yet checked out the edit-difference-index, but I suspect that
it, too will work will for my original purpose.  I also want to thank,
anonymously, another person who sent me some suggestions off list which
are interesting, but don't work well for my situation.

Best regards.

Clyde Schechter
Albert Einstein College of Medicine
Bronx, NY, USA


*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index