Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

st: fuzzy merge problem


From   "Dimitriy V. Masterov" <[email protected]>
To   Statalist <[email protected]>
Subject   st: fuzzy merge problem
Date   Tue, 21 Sep 2010 17:52:57 -0400

I have a dataset of US counties and DMAs that looks like this:

ndma	                    county
CHICO-REDDING	BUTTE (C-SPLIT), CA
CHICO-REDDING	BUTTE (REMAINDER), CA
CINCINNATI	ADAMS, OH
CINCINNATI	BOONE, KY
CINCINNATI	BRACKEN, KY

I also have a dataset of counties that look like this:

county
BUTTE, CA
BUTTE, ID
BUTTE, SD
BUTTS, GA
CABARRUS, NC


The problem is that in the second dataset, BUTTE, CA county is not
split into two regions. There are many cases like this (too many to do
by hand) and I cannot merely delete the text in parentheses since it
is not always in parentheses, and the text varries. I can't use FIPS
code since it's not available in the first dataset. I need to merge
these datasets to use the dma information.

I tried merging on the first word in the county name and the state,
but that runs into problems with county names that begin with Spanish
articles. I tried M Blasnik's -reclink- (v 1.7 14-Jan-2010), but I
get:

. reclink county using ".\ihs_counties.dta", idmaster(county)
idusing(county) gen(match);
variable county not found

The variable county certainly exists in both datasets and it is a valid id.

I am using Stata 11.1 on a 64-bit Windows machine.

Any suggestions?

Dimitriy Masterov
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index