Stata: Data Analysis and Statistical Software
Bookmark and Share


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: Fuzzy collapse


From   Daniel Feenberg <feenberg@nber.org>
To   "statalist@hsphsun2.harvard.edu" <statalist@hsphsun2.harvard.edu>
Subject   Re: st: Fuzzy collapse
Date   Thu, 26 Jan 2012 18:55:00 -0500 (EST)


On Thu, 26 Jan 2012, Steve Nakoneshny wrote:

Charles,

I agree with you that -soundex- may not be appropriate given assumptions about English background, but it may still be a reasonable option to try given the similarities of the strings you provided as examples (despite being French). It may or may not work though.

Soundex was created to encode foreign names into the latin alphabet while robust to transliteration problems. Other than avoiding accented characters, it isn't especially English oriented.

daniel feenberg

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2013 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   Site index