[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

st: RE: Merging by string variables

From   "Nick Cox" <>
To   <>
Subject   st: RE: Merging by string variables
Date   Tue, 1 Dec 2009 15:15:21 -0000

There's no easy and failsafe solution here. 

-merge- doesn't know about meanings or approximate matches. It's entirely literal. 

I can think of two strategies. 

1. You just need to work on one or indeed both datasets to produce variables that will merge. There's detailed advice within 

SJ-8-3  dm0039  . . .  Stata tip 64: Cleaning up user-entered string variables
        . . . . . . . . . . . . . . . . . . . . . . . .  J. Herrin and E. Poen
        Q3/08   SJ 8(3):444--445                                 (no commands)
        tip on how to clean up user-entered string variables

2. You could try soundex or similar tricks. Your example doesn't look encouraging for that strategy. 


Meryle Weinstein, Ph.D.

I have two datasets that Im trying to merge by the following string
variables:  agencyname sitename siteaddress.  There are slight differences
in the datasets, particularly in the agencyname and sitename variables so
I'm having trouble merging the two datasets. The problem seems to be that
the agencyname differs slightly in each of the datasets.  For example 

Dataset2                                                     dataset2
68th precinct youth council inc                              68th precinct
youth council, inc.
action center for education and community development, inc   action center
for education and community development-ps 106
amistad child day care and family center inc                 amistad early
childhood educational center inc

Any suggestions on a way to merge by these three variables would be

*   For searches and help try:

© Copyright 1996–2017 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index