Statalist


[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

st: RE: Matching Names


From   "Nick Cox" <n.j.cox@durham.ac.uk>
To   <statalist@hsphsun2.harvard.edu>
Subject   st: RE: Matching Names
Date   Fri, 8 Aug 2008 15:00:28 +0100

I guess everyone will agree that this kind of problem is a big deal and a big pain. 

It's also a common one. 

Last month Rufus Peabody started a similar thread: see the start at 

<http://www.hsph.harvard.edu/cgi-bin/lwgate/STATALIST/archives/statalist.0807/Author/article-87.html> 

Subsequently, Jeph Herrin and Eva Poen put together their contributions to this thread, with some further thoughts. Their combined advice will appear as a Stata Journal Tip in Stata Journal 8(3) 2008. 

Nick 
n.j.cox@durham.ac.uk 

Max Perez Leon

I am having a big problem trying to merge to datasets with names. The problem is
that there are tons of typos in both datasets. Examples bellow:

DATASET 1: --------------------- DATASET 2:

NAMES--------------------------- NAMES

LUIS PÉREZ --------------------- LUIS P´REZ
WILLIAM SMITH ------------------ WILLIAM SMITHSS
JORGE F. CHOCAN ---------------- JORGE F CHOCANOS
P. BROWN ----------------------- PAUL BROWN
ENRIQUETA GAUDENCIA------------- ENRIQUETA G

I could do it by hand but I have 52568 obs and more to come. I am trying to
establish a method using regular expressions so that I can merge correctly the
datasets.

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index