[Date Prev][Date Next][Thread Prev][Thread Next][Date index][Thread index]

st: RE: Matching Names

From	"Nick Cox" <[email protected]>
To	<[email protected]>
Subject	st: RE: Matching Names
Date	Fri, 8 Aug 2008 15:00:28 +0100

I guess everyone will agree that this kind of problem is a big deal and a big pain. 

It's also a common one. 

Last month Rufus Peabody started a similar thread: see the start at 

<http://www.hsph.harvard.edu/cgi-bin/lwgate/STATALIST/archives/statalist.0807/Author/article-87.html> 

Subsequently, Jeph Herrin and Eva Poen put together their contributions to this thread, with some further thoughts. Their combined advice will appear as a Stata Journal Tip in Stata Journal 8(3) 2008. 

Nick 
[email protected] 

Max Perez Leon

I am having a big problem trying to merge to datasets with names. The problem is
that there are tons of typos in both datasets. Examples bellow:

DATASET 1: --------------------- DATASET 2:

NAMES--------------------------- NAMES

LUIS P�REZ --------------------- LUIS P�REZ
WILLIAM SMITH ------------------ WILLIAM SMITHSS
JORGE F. CHOCAN ---------------- JORGE F CHOCANOS
P. BROWN ----------------------- PAUL BROWN
ENRIQUETA GAUDENCIA------------- ENRIQUETA G

I could do it by hand but I have 52568 obs and more to come. I am trying to
establish a method using regular expressions so that I can merge correctly the
datasets.

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

Follow-Ups:
- Re: st: RE: Matching Names
  - From: "Michael Blasnik" <[email protected]>

References:
- st: Matching Names
  - From: Max Perez Leon <[email protected]>

Prev by Date: st: RE: RE: Autocorrelation in Poisson regression
Next by Date: st: RE: RE: swilk test Ho:
Previous by thread: Re: st: RE: Matching Names
Next by thread: Re: st: RE: Matching Names
Index(es):
- Date
- Thread