You might try -strgroup- , available on SSC; it matches strings based on
their Levenshtein edit distance.
You may also want to
replace myvar=lower(myvar)
to avoid trapping case differences (Mexico vs mexico).
hth,
Jeph
On 1/9/2013 7:02 AM, Estrella Gomez wrote:
Does anybody know how can I check the existence of spelling error in my
dataset? I have two string variables with nearly 1,500,000 observations,
and I would like to check if in some cases there are different names for
the same individual (i.e.: Mexico / Mejico / mexico)
Thank you
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/faqs/resources/statalist-faq/
* http://www.ats.ucla.edu/stat/stata/
*
* For searches and help try:
* http://www.stata.com/help.cgi?search
* http://www.stata.com/support/faqs/resources/statalist-faq/
* http://www.ats.ucla.edu/stat/stata/