Bookmark and Share

Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down at the end of May, and its replacement, statalist.org is already up and running.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: fuzzy merge problem


From   Scott Merryman <scott.merryman@gmail.com>
To   statalist@hsphsun2.harvard.edu
Subject   Re: st: fuzzy merge problem
Date   Wed, 22 Sep 2010 09:58:01 -0500

On Tue, Sep 21, 2010 at 4:52 PM, Dimitriy V. Masterov
<dvmaster@gmail.com> wrote:
<snip>
> I tried merging on the first word in the county name and the state,
> but that runs into problems with county names that begin with Spanish
> articles.

Perhaps you could elaborate on this or give a more extensive example.

Would this method of extracting the county names using the county data
set work:

clear
input str20 county
"BUTTE, CA"
"BUTTE, ID"
"BUTTE, SD"
"BUTTS, GA"
"CABARRUS, NC"
"CONTRA COSTA, CA"
"SAN LUIS OBISPO, CA"
end
gen county2 = substr(county,1, length(county) -4)
levelsof county2,local(levels)

clear
input str13 ndma str29 county
"CHICO-REDDING"   "BUTTE (C-SPLIT), CA"
"CHICO-REDDING"   "BUTTE (REMAINDER), CA"
"CINCINNATI"      "ADAMS, OH"
"CINCINNATI"      "BOONE, KY"
"CINCINNATI"      "BRACKEN, KY"
"Concord"         "CONTRA COSTA, CA"
"Concord"         "(C-SPLIT) CONTRA COSTA, CA"
"Pismo Beach"     "SAN LUIS OBISPO, CA"
end

gen state = substr(county,-2,.)
gen str county2 = ""
foreach l of local levels {
	replace county2 =  "`l'" if regexm(county, "`l'")
}
l


Scott
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   Site index