Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: nearmrg for strings (titles)


From   Eric Booth <[email protected]>
To   "<[email protected]>" <[email protected]>
Subject   Re: st: nearmrg for strings (titles)
Date   Tue, 30 Aug 2011 19:20:28 +0000

<>

I tried -nearmrg- (from SSC) using a string variable in both datasets for the merge var and got the same error as Michaela.  It works if both variables are numeric, but when both are string, I get the error.  I never noticed that -nearmrg- worked (or is supposed to work) with string matching variables before -- based on the help file,  I thought it matched on numeric vars only (I've used it to match the nearest date), but I do now see the passing reference to string vars and the lower/upper options in the help file.  Turning trace on, this code:

. nearmrg name using sample.dta, nearvar(name) lower //ref. to example below 

produces this error:

 = if "lower"!="" gen double __000004=cond(name!=__000002,__000001,__000002)
type mismatch

so there's probably some quotes missing in this line (around the temp vars?).  I get the same error using the 'upper' option.  

__
Instead I usually use -reclink- (from SSC) for this kind of matching.  I haven't tried Dan's -imatch- for this purpose.
Here's an example using -reclink-:

*****************!
clear
inp str20 name
 "manuela Hech"
  "Chris Mueller"
"Fanzisa Haller "
"Ulrike Loerr"
end
g x = 1
g idusing = _n
replace name = trim(lower(name))
sa "sample.dta", replace



clear 
inp str20 name
 "manuela Hecher"
 "Christian Mueller"
 "Fanzisa Haller "
 "Ulrike Loerr"
end
g y = 0
g idmaster = _n
replace name = trim(lower(name))


//nearmrg: produces "type mismatch" error
*nearmrg name using sample.dta, nearvar(name) lower


//reclink
reclink name  using sample.dta, idmaster(idmaster) ///
 idusing(idusing) gen(_match) minscore(.75)

li name Una _match
*****************!
- Eric
__
Eric A. Booth
Public Policy Research Institute
Texas A&M University
[email protected]
Office: +979.845.6754


On Aug 30, 2011, at 2:43 AM, Nick Cox wrote:

> Perhaps -name- is string in one dataset and a numeric variable with
> value labels in the other. Alternatively, there is some such clash
> between datasets.
> 
> -nearmrg- is a user-written program from SSC. Please remember to
> specify where user-written programs you refer to come from.
> 
> Nick
> 
> On Tue, Aug 30, 2011 at 7:57 AM, Hoecher, Michaela (0613xxx)
> <[email protected]> wrote:
> 
>> I would like to merge two datasets (variables: title, date, publisher).
>> The problems is, that strings (tiltes of a book), that are not absolutely the same sould be merged/matched.
>> - Does it make sense to use nearmrg for this?
>> - In which way are strings merged/matched?
>> - What would you recommend me?
>> 
>> - I wanted to test nearmrg, but I got an error message "type mismatch":
>> 
>> string_masterfile.dta
>> +--------------------------------------+
>> | id      name        gender       age
>> |---------------------------------------
>> | 5       franzi            1           23
>> | 1       meli              1           32
>> | 2       michaela       1           20
>> | 6       ali                 2           25
>> | 3       christ            2           20
>> | 4       martin           2           44
>> +--------------------------------------+
>> 
>> string_matchfile2.dta
>> +---------------------------------------+
>> | id      name        gender       age
>> |----------------------------------------
>> | 5       franzi          1               13
>> | 1       michi          1               15
>> | 2       susi            1               22
>> | 4       ali              2               25
>> | 3       chris          2                20
>> | 5       felix           2                43
>> +---------------------------------------+
>> 
>> When I use the command:
>>      nearmrg gender using string_matchfile2.dta, nearvar(name) lower genmatch(samename)
>> or
>>      nearmrg gender using string_matchfile2.dta, nearvar(name) lower force genmatch(samename)
>> 
>> I geht the error message: "type mismatch"
> 
> *
> *   For searches and help try:
> *   http://www.stata.com/help.cgi?search
> *   http://www.stata.com/support/statalist/faq
> *   http://www.ats.ucla.edu/stat/stata/


*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/statalist/faq
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index