Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.
From | Scott Merryman <scott.merryman@gmail.com> |
To | statalist@hsphsun2.harvard.edu |
Subject | Re: st: fuzzy merge problem |
Date | Wed, 22 Sep 2010 09:58:01 -0500 |
On Tue, Sep 21, 2010 at 4:52 PM, Dimitriy V. Masterov <dvmaster@gmail.com> wrote: <snip> > I tried merging on the first word in the county name and the state, > but that runs into problems with county names that begin with Spanish > articles. Perhaps you could elaborate on this or give a more extensive example. Would this method of extracting the county names using the county data set work: clear input str20 county "BUTTE, CA" "BUTTE, ID" "BUTTE, SD" "BUTTS, GA" "CABARRUS, NC" "CONTRA COSTA, CA" "SAN LUIS OBISPO, CA" end gen county2 = substr(county,1, length(county) -4) levelsof county2,local(levels) clear input str13 ndma str29 county "CHICO-REDDING" "BUTTE (C-SPLIT), CA" "CHICO-REDDING" "BUTTE (REMAINDER), CA" "CINCINNATI" "ADAMS, OH" "CINCINNATI" "BOONE, KY" "CINCINNATI" "BRACKEN, KY" "Concord" "CONTRA COSTA, CA" "Concord" "(C-SPLIT) CONTRA COSTA, CA" "Pismo Beach" "SAN LUIS OBISPO, CA" end gen state = substr(county,-2,.) gen str county2 = "" foreach l of local levels { replace county2 = "`l'" if regexm(county, "`l'") } l Scott * * For searches and help try: * http://www.stata.com/help.cgi?search * http://www.stata.com/support/statalist/faq * http://www.ats.ucla.edu/stat/stata/