Bookmark and Share

Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down at the end of May, and its replacement, statalist.org is already up and running.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

st: fuzzy merge problem


From   "Dimitriy V. Masterov" <dvmaster@gmail.com>
To   Statalist <statalist@hsphsun2.harvard.edu>
Subject   st: fuzzy merge problem
Date   Tue, 21 Sep 2010 17:52:57 -0400

I have a dataset of US counties and DMAs that looks like this:

ndma	                    county
CHICO-REDDING	BUTTE (C-SPLIT), CA
CHICO-REDDING	BUTTE (REMAINDER), CA
CINCINNATI	ADAMS, OH
CINCINNATI	BOONE, KY
CINCINNATI	BRACKEN, KY

I also have a dataset of counties that look like this:

county
BUTTE, CA
BUTTE, ID
BUTTE, SD
BUTTS, GA
CABARRUS, NC


The problem is that in the second dataset, BUTTE, CA county is not
split into two regions. There are many cases like this (too many to do
by hand) and I cannot merely delete the text in parentheses since it
is not always in parentheses, and the text varries. I can't use FIPS
code since it's not available in the first dataset. I need to merge
these datasets to use the dma information.

I tried merging on the first word in the county name and the state,
but that runs into problems with county names that begin with Spanish
articles. I tried M Blasnik's -reclink- (v 1.7 14-Jan-2010), but I
get:

. reclink county using ".\ihs_counties.dta", idmaster(county)
idusing(county) gen(match);
variable county not found

The variable county certainly exists in both datasets and it is a valid id.

I am using Stata 11.1 on a 64-bit Windows machine.

Any suggestions?

Dimitriy Masterov
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   Site index