Bookmark and Share

Notice: On March 31, it was announced that Statalist is moving from an email list to a forum. The old list will shut down on April 23, and its replacement, statalist.org is already up and running.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: fuzzy merge problem


From   Anders Alexandersson <andersalex@gmail.com>
To   statalist@hsphsun2.harvard.edu
Subject   Re: st: fuzzy merge problem
Date   Wed, 22 Sep 2010 09:56:18 -0400

For the user-written command -reclink-, it seems that the id variable
must not be in the varlist.
For your example, I would create an id variable in both datasets, for
example, -gen id = _n-, and then run
. reclink county using ".\ihs_counties.dta", idmaster(id) idusing(id) gen(match)

Anders
andersalex@gmail.com

On Tue, Sep 21, 2010 at 5:52 PM, Dimitriy V. Masterov
<dvmaster@gmail.com> wrote:
> I have a dataset of US counties and DMAs that looks like this:
>
> ndma                        county
> CHICO-REDDING   BUTTE (C-SPLIT), CA
> CHICO-REDDING   BUTTE (REMAINDER), CA
> CINCINNATI      ADAMS, OH
> CINCINNATI      BOONE, KY
> CINCINNATI      BRACKEN, KY
>
> I also have a dataset of counties that look like this:
>
> county
> BUTTE, CA
> BUTTE, ID
> BUTTE, SD
> BUTTS, GA
> CABARRUS, NC
>
>
> The problem is that in the second dataset, BUTTE, CA county is not
> split into two regions. There are many cases like this (too many to do
> by hand) and I cannot merely delete the text in parentheses since it
> is not always in parentheses, and the text varries. I can't use FIPS
> code since it's not available in the first dataset. I need to merge
> these datasets to use the dma information.
>
> I tried merging on the first word in the county name and the state,
> but that runs into problems with county names that begin with Spanish
> articles. I tried M Blasnik's -reclink- (v 1.7 14-Jan-2010), but I
> get:
>
> . reclink county using ".\ihs_counties.dta", idmaster(county)
> idusing(county) gen(match);
> variable county not found
>
> The variable county certainly exists in both datasets and it is a valid id.
>
> I am using Stata 11.1 on a 64-bit Windows machine.
>
> Any suggestions?

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2014 StataCorp LP   |   Terms of use   |   Privacy   |   Contact us   |   Site index