Stata The Stata listserver
[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

st: matching misspelled names


From   Clyde Schechter <cschecht@aecom.yu.edu>
To   statalist@hsphsun2.harvard.edu
Subject   st: matching misspelled names
Date   Fri, 23 Aug 2002 16:07:09 -0400

I have a  dataset, one of whose variables contains names of drugs.  Many of
the entries are misspelled or truncated.  I have an index file with a
reasonably complete list of commercial and generic drug names.  After
merging the files and identifying exact matches, I would like to try to
match the remaining, presumably misspelled, drug names with a corresponding
correct name from the index.  When the names are of people, the soundex
algorithm usually provides a reasonably short list of candidate matches.
But trying it with these drug names, many of the misspellings match with
several dozen candidates, making the resulting list of names and candidate
matches for manual review and selection unworkably long.  

Does anybody out there know of an alternative to soundex coding that might
work better in this peculiar vocabulary?  Or of another approach to this
problem?

Thanks in advance for any help.

Clyde Schechter
Dept. of Family Medicine & Community Health
Albert Einstein College of Medicine
Bronx, NY, USA

*
*   For searches and help try:
*   http://www.stata.com/support/faqs/res/findit.html
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index