Bookmark and Share

Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down on April 23, and its replacement, statalist.org is already up and running.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: Working with complex strings


From   Clyde B Schechter <clyde.schechter@einstein.yu.edu>
To   "statalist@hsphsun2.harvard.edu" <statalist@hsphsun2.harvard.edu>
Subject   Re: st: Working with complex strings
Date   Thu, 1 Dec 2011 18:40:13 +0000

Several people have made some interesting and impressively useful suggestions in this thread.

Let me just point out other problems that may be encountered in using drug data:

1.  Each drug has one generic name and one or more brand names.  For most purposes other than marketing analyses, these need to be identified as the same drug.

2.  Each generic and brand name is subject to seemingly innumerable typographical errors and misspellings.  Surprisingly, this is true even if the source of the data is a download from a computerized order entry system or a pharmacy database!

3.  Although with newer drugs it is less common, misspellings or typos in a drug name can actually turn it into a correct or nearly correct name of a different drug.  (Patients have died as a result of this; the FDA has been cracking down on easily confusable drug names.)

4.  Soundex and similar tools do a rather poor job of dealing with the errors noted in 2 because drug names are not phonologically like normal English words.

5.  Frequently, even different drugs may need to be identified as the same for analytic purposes if they are chemical equivalents (e.g. a hydrochloride and a citrate of the same base), or of the same general pharmacologic class (e.g. statins) or serve the same clinical purpose (e.g. blood pressure lowering drugs).

The bottom line is that in 3 decades of experience I have only once worked on a project with drug data that didn't ultimately require some non-automated direct human manipulation of the data.  (And that one almost doesn't count because it was a study involving only 2 drugs!)

That said, the suggestions made earlier in this thread should reduce the final hand-work considerably.

Good luck.

Clyde Schechter
Dept. of Family & Social Medicine
Albert Einstein College of Medicine
Bronx, NY, USA

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   Site index