Stata: Data Analysis and Statistical Software

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

st: fuzzy merge problem

From	"Dimitriy V. Masterov" <[email protected]>
To	Statalist <[email protected]>
Subject	st: fuzzy merge problem
Date	Tue, 21 Sep 2010 17:52:57 -0400

I have a dataset of US counties and DMAs that looks like this:

ndma	                    county
CHICO-REDDING	BUTTE (C-SPLIT), CA
CHICO-REDDING	BUTTE (REMAINDER), CA
CINCINNATI	ADAMS, OH
CINCINNATI	BOONE, KY
CINCINNATI	BRACKEN, KY

I also have a dataset of counties that look like this:

county
BUTTE, CA
BUTTE, ID
BUTTE, SD
BUTTS, GA
CABARRUS, NC


The problem is that in the second dataset, BUTTE, CA county is not
split into two regions. There are many cases like this (too many to do
by hand) and I cannot merely delete the text in parentheses since it
is not always in parentheses, and the text varries. I can't use FIPS
code since it's not available in the first dataset. I need to merge
these datasets to use the dma information.

I tried merging on the first word in the county name and the state,
but that runs into problems with county names that begin with Spanish
articles. I tried M Blasnik's -reclink- (v 1.7 14-Jan-2010), but I
get:

. reclink county using ".\ihs_counties.dta", idmaster(county)
idusing(county) gen(match);
variable county not found

The variable county certainly exists in both datasets and it is a valid id.

I am using Stata 11.1 on a 64-bit Windows machine.

Any suggestions?

Dimitriy Masterov
*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/

Follow-Ups:
- Re: st: fuzzy merge problem
  - From: Scott Merryman <[email protected]>
- Re: st: fuzzy merge problem
  - From: Anders Alexandersson <[email protected]>

Prev by Date: Re: st: RE: RE: how to use timestamp of a file (that one is insheeting)
Next by Date: Re: st: ml methods d1 and d2 and robust / clustered standard errors
Previous by thread: st: Logistic regression interpretation
Next by thread: Re: st: fuzzy merge problem
Index(es):
- Date
- Thread