Bookmark and Share

Notice: On April 23, 2014, Statalist moved from an email list to a forum, based at statalist.org.


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: st: AW: invalid syntax error in reclink depending on variables for fuzzy matching


From   Nick Cox <[email protected]>
To   "[email protected]" <[email protected]>
Subject   Re: st: AW: invalid syntax error in reclink depending on variables for fuzzy matching
Date   Sun, 6 Apr 2014 20:58:14 +0100

For the record, this code wouldn't work unless you have Stata 7
upwards and -- given that -- there is no reason to use the (now long)
out-of-date -for- command, which is not documented properly except in
Stata 6.

This should work:

foreach x of num 33/47 96 {
   foreach v in mf_mauty mf_marke_Str {
       replace `v' = subinstr(`v',char(`x'),"",.)
   }
}

That is, advice to use -for- is not good advice unless it's your only
choice and you have access to documentation on it so that you can
understand what you are doing and can fix any mistakes in its use.

Whether blanking out all those characters is good advice I leave on one side.

Nick
[email protected]


On 6 April 2014 16:57, Roth Florian <[email protected]> wrote:

> the problem was that reclink doesn't like certain special characters in the strings. To solve this issue Mercoledi Nasiir proposed to use the following code
>
> forvalues x=33/39 {
> for var mf_mauty mf_marke_Str: replace X=subinstr(X,char(`x'),"",.)
> }
> foreach x in 40 41 42 43 44 45 46 47 96 {
> for var mf_mauty mf_marke_Str: replace X=subinstr(X,char(`x'),"",.)
> }
>
> In my case " ` " was the crucial character (ASCII Code 96). Maybe this will help someone later on with the same problem.

Roth Florian

> I'm trying to run a fuzzy match of car registry data with additional price data. Since the registry data is not very clean I can't just use merge. When I use the following code I get an error invalid syntax r(198). I usually get the error after about half the matching is done. I ran the code using Stata 12.1 and reclink 1.7
>
> use CarReg.dta, clear
>
> reclink str_brand str_model_part1 str_model_part2 rom_displacement rom_fuel_type rom_gear_box rom_import_year using "Price.dta", idmaster(idmaster) idusing(idusing) gen(matchscore) ///
>                         required(str_brand str_model_part1 rom_displacement rom_import_year rom_fuel_type rom_gear_box) wmatch( 1 1 3 1 1 1 3)
>
> I get the following output:
>
> 15429 perfect matches found
>
> Going through 48915 observation to assess fuzzy matches, each .=5% complete .........invalid syntax r(198);
>
>
> The variables starting with "str_" are strings, those starting with "rom_" are integer variables converted to roman numbers (I found that the matching works better with roman numbers). I can't really find a syntax error and the same code has worked with a slightly different data set and without the variables rom_fuel_type rom_gear_box. I have started removing variables to see if they could be the problem. If I remove enough variables the code works. But I can't figure out what is wrong with these variables. Does somebody have an idea what the problem could be?

*
*   For searches and help try:
*   http://www.stata.com/help.cgi?search
*   http://www.stata.com/support/faqs/resources/statalist-faq/
*   http://www.ats.ucla.edu/stat/stata/


© Copyright 1996–2018 StataCorp LLC   |   Terms of use   |   Privacy   |   Contact us   |   Site index