Stata: Data Analysis and Statistical Software

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: fuzzy merge problem

From	Scott Merryman <[email protected]>
To	[email protected]
Subject	Re: st: fuzzy merge problem
Date	Wed, 22 Sep 2010 09:58:01 -0500

On Tue, Sep 21, 2010 at 4:52 PM, Dimitriy V. Masterov
<[email protected]> wrote:
<snip>
> I tried merging on the first word in the county name and the state,
> but that runs into problems with county names that begin with Spanish
> articles.

Perhaps you could elaborate on this or give a more extensive example.

Would this method of extracting the county names using the county data
set work:

clear
input str20 county
"BUTTE, CA"
"BUTTE, ID"
"BUTTE, SD"
"BUTTS, GA"
"CABARRUS, NC"
"CONTRA COSTA, CA"
"SAN LUIS OBISPO, CA"
end
gen county2 = substr(county,1, length(county) -4)
levelsof county2,local(levels)

clear
input str13 ndma str29 county
"CHICO-REDDING"   "BUTTE (C-SPLIT), CA"
"CHICO-REDDING"   "BUTTE (REMAINDER), CA"
"CINCINNATI"      "ADAMS, OH"
"CINCINNATI"      "BOONE, KY"
"CINCINNATI"      "BRACKEN, KY"
"Concord"         "CONTRA COSTA, CA"
"Concord"         "(C-SPLIT) CONTRA COSTA, CA"
"Pismo Beach"     "SAN LUIS OBISPO, CA"
end

gen state = substr(county,-2,.)
gen str county2 = ""
foreach l of local levels {
	replace county2 =  "`l'" if regexm(county, "`l'")
}
l


Scott
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

References:
- st: fuzzy merge problem
  - From: "Dimitriy V. Masterov" <[email protected]>

Prev by Date: st: RE: Binary time series
Next by Date: st: problem with _statsby_
Previous by thread: Re: st: fuzzy merge problem
Next by thread: st: Date: Wed, 22 Sep 2010 08:37:30 +1000
Index(es):
- Date
- Thread