Statalist


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[no subject]



*****
clear
cap ssc install reclink
**

input str44 var1     var2                                               
"68th precinct youth council inc   "    1                      
"action center for education and community development, inc "  2
"amistad child day care and family center inc"      3          
end

replace var1 = lower(var1)
replace var1 = ltrim(var1)
replace var1 = rtrim(var1)
replace var1 = substr(var1, 1, 10)
sort var1
gen idusing = _n
save "datasetusing.dta", replace


clear
input str44 var1 var3
" 68th precinct youth council, inc."  3
"action center for education and community development-ps 106"  5
" amistad early childhood educational center inc"  5
end

replace var1 = lower(var1)
replace var1 = ltrim(var1)
replace var1 = rtrim(var1)
replace var1 = substr(var1, 1, 10)
sort var1
gen idmaster = _n
save "datasetmaster.dta", replace

//merge with reclink//
reclink var1 using "datasetusing.dta", idmaster(idmaster) idusing(idusing)  gen(_match) minscore(.75)
li var1 Uvar1 _match

*******


Eric

__
Eric A. Booth
Public Policy Research Institute
Texas A&M University
ebooth@ppri.tamu.edu
Office: +979.845.6754



On Dec 1, 2009, at 9:15 AM, Nick Cox wrote:

> There's no easy and failsafe solution here. 
> 
> -merge- doesn't know about meanings or approximate matches. It's entirely literal. 
> 
> I can think of two strategies. 
> 
> 1. You just need to work on one or indeed both datasets to produce variables that will merge. There's detailed advice within 
> 
> SJ-8-3  dm0039  . . .  Stata tip 64: Cleaning up user-entered string variables
>        . . . . . . . . . . . . . . . . . . . . . . . .  J. Herrin and E. Poen
>        Q3/08   SJ 8(3):444--445                                 (no commands)
>        tip on how to clean up user-entered string variables
> 
> 2. You could try soundex or similar tricks. Your example doesn't look encouraging for that strategy. 
> 
> Nick 
> n.j.cox@durham.ac.uk 
> 
> Meryle Weinstein, Ph.D.
> 
> I have two datasets that Im trying to merge by the following string
> variables:  agencyname sitename siteaddress.  There are slight differences
> in the datasets, particularly in the agencyname and sitename variables so
> I'm having trouble merging the two datasets. The problem seems to be that
> the agencyname differs slightly in each of the datasets.  For example 
> 
> Dataset2                                                     dataset2
> 68th precinct youth council inc                              68th precinct
> youth council, inc.
> action center for education and community development, inc   action center
> for education and community development-ps 106
> amistad child day care and family center inc                 amistad early
> childhood educational center inc
> 
> Any suggestions on a way to merge by these three variables would be
> appreciated.
> 
> 
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/statalist/faq
> *   http://www.ats.ucla.edu/stat/stata/





*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/



© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   What's new   |   Site index